Python Replace Whole Values in Dataframe String and Not Substrings

I am trying to replace strings in a dataframe if the whole string equals another string. I do not want to replace substrings.

So:

If I have df:

 Index  Name       Age
   0     Joe        8
   1     Mary       10
   2     Marybeth   11

and I want to replace "Mary" when the whole string matches "Mary" with "Amy" so I get

 Index  Name       Age
   0     Joe        8
   1     Amy        10
   2     Marybeth   11

I'm doing the following:

df['Name'] = df['Name'].apply(lambda x: x.replace('Mary','Amy'))

My understanding from searching around is that the defaults of replace set regex=False and replace should look for the whole value in the dataframe to be "Mary". Instead I'm getting this result:

 Index  Name       Age
   0     Joe        8
   1     Amy        10
   2     Amybeth   11

What am I doing wrong?

3 answers

  • answered 2018-01-11 19:55 Wen

    replace + dict is the way to go (With DataFrame, you are using Series.str.replace)

    df['Name'].replace({'Mary':'Amy'})
    Out[582]: 
    0         Joe
    1         Amy
    2    Marybeth
    Name: Name, dtype: object
    df['Name'].replace({'Mary':'Amy'},regex=True)
    Out[583]: 
    0        Joe
    1        Amy
    2    Amybeth
    Name: Name, dtype: object
    

    Notice they are different

    Series: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.str.replace.html

    DataFrame: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.replace.html

  • answered 2018-01-11 19:56 Alexander

    You can use also loc to locate instances where the name exactly matches, and then set to the new name.

    df.loc[df['Name'] == 'Mary', 'Name'] = "Amy"
    

  • answered 2018-01-11 20:57 MaxU

    Explanation:

    When you apply it like this - you are working with strings, not with Pandas Series:

    In [42]: df['Name'].apply(lambda x: print(type(x)))
    <class 'str'>  # <---- NOTE
    <class 'str'>  # <---- NOTE
    <class 'str'>  # <---- NOTE
    Out[42]:
    0    None
    1    None
    2    None
    Name: Name, dtype: object
    

    It's the same as:

    In [44]: 'Marybeth'.replace('Mary','Amy')
    Out[44]: 'Amybeth'
    

    Solution:

    Use Series.replace(to_replace=None, value=None, inplace=False, limit=None, regex=False, method='pad', axis=None) properly (without Series.apply()) - per default (regex=False) it will replace whole strings - as you expect it to work:

    In [39]: df.Name.replace('Mary','Amy')
    Out[39]:
    0         Joe
    1         Amy
    2    Marybeth
    Name: Name, dtype: object
    

    you can explicitly specify regex=True, this will replace substrings:

    In [40]: df.Name.replace('Mary','Amy', regex=True)
    Out[40]:
    0        Joe
    1        Amy
    2    Amybeth
    Name: Name, dtype: object
    

    NOTE: Series.str.replace(pat, repl, n=-1, case=None, flags=0) doesn't have regex parameter - it's always treats pat and repl as RegEx's:

    In [41]: df.Name.str.replace('Mary','Amy')
    Out[41]:
    0        Joe
    1        Amy
    2    Amybeth
    Name: Name, dtype: object