removing a list of words from a dataframe

I have a data frame made of data series containing strings. I have a list of strings that I wish to be removed from each row.

tcl_list = ["tab", "cr", "lf", "doublequote", "singlequote", "eof"]
df[['Summary', 'Description']] = re.sub("|".join(tcl_list), ' ', df[['Summary', 'Description']])

For example:

From this:

the tab dog is acting sneaky like a doublequote cat doublequote

To this:

the dog is acting sneaky like a cat

However, I get this error:

TypeError: expected string or bytes-like object

I have tried using the apply() and lambda functions but am unsuccessful. Any suggestions?

1 answer

  • answered 2018-11-08 05:41 Naga Kiran

    i think regular expression needs to apply on individual string of column

    df['val'] = ['the tab dog is acting sneaky like a doublequote cat doublequote']
    
    df.val.apply(lambda x: re.sub("|".join(tcl_list),'',x))
    

    Or

    df.val.str.replace("|".join(tcl_list),'')
    

    Out:

    0    the  dog is acting sneaky like a  cat 
    Name: val, dtype: object