Pandas replace negative value with zero, in columns that match a regular expression

The goal is to replace all negative values in only certain columns ("capped1" and "capped2" but not "signed") with zeros. The columns need to be selected by a regular expression. (actual df has >1000 columns with more complex names)

I came up with:

import pandas as pd
import re
import numpy as np
index = [1,2,3,4]
d = {'capped1': [1,0,-1,np.nan], 'capped2': [2,0,np.nan,-9999],'signed':[2,0,-3,np.nan]}
df = pd.DataFrame(data=d, index=index)
df_right = df.filter(regex=("capped.*")).clip(lower=0)
df_left = df.drop(list(df_right.columns), 1)
df_out = df_left.merge(df_right,left_index=True,right_index=True,how="outer")
df_out

is there an better way to do this? My guess is that this can be replaced by one line instead of 3 where you replace the values in df directly.

2 answers

  • answered 2017-10-11 10:14 jezrael

    You can get columns names and then apply function only in subset:

    cols = df.columns[df.columns.str.contains('^capped.*')]
    print (cols)
    Index(['capped1', 'capped2'], dtype='object')
    
    df[cols] = df[cols].clip(lower=0)
    print (df)
       capped1  capped2  signed
    1      1.0      2.0     2.0
    2      0.0      0.0     0.0
    3      0.0      NaN    -3.0
    4      NaN      0.0     NaN
    

    Similar solution:

    m = df.columns.str.contains('^capped.*')
    print (m)
    [ True  True False]
    
    df.loc[:, m] = df.loc[:, m].clip(lower=0)
    print (df)
       capped1  capped2  signed
    1      1.0      2.0     2.0
    2      0.0      0.0     0.0
    3      0.0      NaN    -3.0
    4      NaN      0.0     NaN
    

    Nice idea from comment by Jon Clements - using regex is not necessary, here is possible use startswith:

     cols = df.columns[df.columns.str.startswith('capped')]
     m = df.columns.str.startswith('capped')
    

  • answered 2017-10-11 10:15 piRSquared

    Option 1
    Use pd.DataFrame.update with pd.DataFrame.clip
    This edits df in place

    df.update(df.filter(regex="^capped.*$").clip(lower=0))
    df
    
       capped1  capped2  signed
    1      1.0      2.0     2.0
    2      0.0      0.0     0.0
    3      0.0      NaN    -3.0
    4      NaN      0.0     NaN
    

    Option 2
    Use pd.DataFrame.assign and np.maximum
    This produces a copy and leaves df alone
    I use np.maximum as variety. I could've used pd.DataFrame.clip
    Notice that I use ** to unpack the dataframe that is returned by np.maximum as a dictionary. It is equivalent to **{c: s for c, s in d.iteritems()} where d is the return value from np.maximum

    df.assign(**np.maximum(df.filter(regex='^capped.*'), 0))
    
       capped1  capped2  signed
    1      1.0      2.0     2.0
    2      0.0      0.0     0.0
    3      0.0      NaN    -3.0
    4      NaN      0.0     NaN