Aggregate n DataFrames into a single DataFrame - element-wise

What is the pythonic way to generate a pandas DataFrame using an aggregation function to combine several other DataFrames? (DataFrames are all of the same shape with same headers.)

Aggregation functions I might be interested in are mean, minimum, maximum and user defined functions taking two operands.

For example, this is the way I might generate a DataFrame (max) which has the maximum element from four other DataFrames (df1,df2,df3,df4). This is hardly a pattern (idiom) that I can apply in all situations and to calculate the mean would look quite different.

import pandas as pd

df1 = pd.DataFrame({'a' : [1,2,3], 'b': [3,9,5]})
df2 = pd.DataFrame({'a' : [6,2,7], 'b': [3,4,5]})
df3 = pd.DataFrame({'a' : [6,2,11], 'b': [3,4,5]})
df4 = pd.DataFrame({'a' : [6,2,7], 'b': [3,12,5]})
print(df1)
print(df2)
print(df3)
print(df4)

max1 = df1.where(df1 > df2, df2)
max2 = df3.where(df3 > df4, df4)
max = max1.where(max1 > max2, max2)
print(max)

An example of a user defined function taking two operands which I will also want to apply, is the following which is essentially the same as max in this example but will be something more complicated in practice

def myagg(a, b):
    return a > b

1 answer

  • answered 2017-11-14 23:46 cᴏʟᴅsᴘᴇᴇᴅ

    Concatenate them all with pd.concat and then groupby to find max:

    pd.concat([df1, df2, df3, df4], 1).groupby(level=0, axis=1).max()
    
        a   b
    0   6   3
    1   2  12
    2  11   5
    

    Compare this to your method which also returns:

    max
    
        a   b
    0   6   3
    1   2  12
    2  11   5
    

    Note that the column names across every dataframe would have to be the same for this (as well as your current method) to work.


    As piR mentioned, the describe function performs some general aggregations for you:

    lodf = [df1, df2, df3, df4]
    
    pd.concat(lodf, keys=range(len(lodf))) \
      .groupby(level=1).describe().swaplevel(0, 1, 1).sort_index(1)
    
        25%       50%       75%       count        max        mean        min  \
          a    b    a    b    a     b     a    b     a     b     a     b    a   
    0  4.75  3.0  6.0  3.0  6.0  3.00   4.0  4.0   6.0   3.0  4.75  3.00  1.0   
    1  2.00  4.0  2.0  6.5  2.0  9.75   4.0  4.0   2.0  12.0  2.00  7.25  2.0   
    2  6.00  5.0  7.0  5.0  8.0  5.00   4.0  4.0  11.0   5.0  7.00  5.00  3.0   
    
                 std            
         b         a         b  
    0  3.0  2.500000  0.000000  
    1  4.0  0.000000  3.947573  
    2  5.0  3.265986  0.000000