Pandas Multiply Specific Columns by Value In Row

I am attempting to multiple specific columns a value in their respective row.

For example:

          X         Y         Z
A 10      1         0         1        
B 50      0         0         0      
C 80      1         1         1

Would become:

              X         Y         Z
A 10        10         0         10        
B 50        0          0         0      
C 80        80         80        80

The problem I am having is that it is timing out when I use mul(). My real dataset is very large. I tried to iterate it with loop in my real code as follows:

for i in range(1,df_final_small.shape[0]): 
    df_final_small.iloc[i].values[3:248] = df_final_small.iloc[i].values[3:248] * df_final_small.iloc[i].values[2]

Which when applied to the example dataframe would look like this:

for i in range(1,df_final_small.shape[0]): 
    df_final_small.iloc[i].values[1:4] = df_final_small.iloc[i].values[1:4] * df_final_small.iloc[i].values[0]

There must be a better way to do this, I am having problems figuring out how to only cast the multiplication to certain columns in the row rather than the entire row.

EDIT: To detail further here is my df.head(5).

id  gross   150413 Welcome Email    150413 Welcome Email Repeat Cust    151001 Welcome Email    151001 Welcome Email Repeat Cust    161116 eKomi    1702 Hot Leads Email    1702 Welcome Email - All Purchases  1804 Hot Leads  ... SILVER  GOLD    PLATINUM    Acquisition Direct Mail Conversion Direct Mail  Retention Direct Mail   Retention eMail cluster x   y
0   0033333 46.2    0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 1.0 0.0 0.0 0.0 1.0 0.0 10  -0.230876   0.461990
1   0033331 2359.0  0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 ... 0.0 1.0 0.0 0.0 0.0 1.0 0.0 9   0.231935    -0.648713
2   0033332 117.0   0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 ... 0.0 1.0 0.0 0.0 0.0 1.0 0.0 5   -0.812921   -0.139403
3   0033334 89.0    0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 ... 0.0 1.0 0.0 0.0 0.0 1.0 0.0 5   -0.812921   -0.139403
4   0033335 1908.0  0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 1.0 0.0 0.0 1.0 0.0 0.0 7   -0.974142   0.145032

2 answers

  • answered 2018-10-11 19:16 W-B

    Using mul with axis = 0 also get the index value by get_level_values

    df.mul(df.index.get_level_values(1),axis=0)
    Out[167]: 
           X   Y   Z
    A 10  10   0  10
    B 50   0   0   0
    C 80  80  80  80
    

    Also when the dataframe is way to big , you can split it and do it by chunk .

    dfs = np.split(df, [2], axis=0)
    pd.concat([x.mul(x.index.get_level_values(1), axis=0) for x in dfs])
    Out[174]: 
           X   Y   Z
    A 10  10   0  10
    B 50   0   0   0
    C 80  80  80  80
    

    Also I will recommend numpy broadcast

    df.values*df.index.get_level_values(1)[:,None]
    Out[177]: Int64Index([[10, 0, 10], [0, 0, 0], [80, 80, 80]], dtype='int64')
    pd.DataFrame(df.values*df.index.get_level_values(1)[:,None],index=df.index,columns=df.columns)
    Out[181]: 
           X   Y   Z
    A 10  10   0  10
    B 50   0   0   0
    C 80  80  80  80
    

  • answered 2018-10-11 19:36 mad_

    Just specify the columns you want to multiply. Example

    df=pd.DataFrame({'A':10,'X':1,'Y':1,'Z':1},index=[1])
    df.loc[:,['X', 'Y', 'Z']]=df.loc[:,['X', 'Y', 'Z']].values*df.iloc[:,0:1].values
    

    If want to provide an arbitrary range of columns use iloc

    range_of_columns= range(10,5001)+range(5030,10001)
    df.iloc[:,range_of_columns].values*df.iloc[:,0:1].values #multiplying the range of columns with the first column