Aggregate n DataFrames into a single DataFrame  elementwise
What is the pythonic way to generate a pandas DataFrame using an aggregation function to combine several other DataFrames? (DataFrames are all of the same shape with same headers.)
Aggregation functions I might be interested in are mean, minimum, maximum and user defined functions taking two operands.
For example, this is the way I might generate a DataFrame (max
) which has the maximum element from four other DataFrames (df1,df2,df3,df4
). This is hardly a pattern (idiom) that I can apply in all situations and to calculate the mean would look quite different.
import pandas as pd
df1 = pd.DataFrame({'a' : [1,2,3], 'b': [3,9,5]})
df2 = pd.DataFrame({'a' : [6,2,7], 'b': [3,4,5]})
df3 = pd.DataFrame({'a' : [6,2,11], 'b': [3,4,5]})
df4 = pd.DataFrame({'a' : [6,2,7], 'b': [3,12,5]})
print(df1)
print(df2)
print(df3)
print(df4)
max1 = df1.where(df1 > df2, df2)
max2 = df3.where(df3 > df4, df4)
max = max1.where(max1 > max2, max2)
print(max)
An example of a user defined function taking two operands which I will also want to apply, is the following which is essentially the same as max in this example but will be something more complicated in practice
def myagg(a, b):
return a > b
1 answer

Concatenate them all with
pd.concat
and thengroupby
to findmax
:pd.concat([df1, df2, df3, df4], 1).groupby(level=0, axis=1).max() a b 0 6 3 1 2 12 2 11 5
Compare this to your method which also returns:
max a b 0 6 3 1 2 12 2 11 5
Note that the column names across every dataframe would have to be the same for this (as well as your current method) to work.
As piR mentioned, the
describe
function performs some general aggregations for you:lodf = [df1, df2, df3, df4] pd.concat(lodf, keys=range(len(lodf))) \ .groupby(level=1).describe().swaplevel(0, 1, 1).sort_index(1) 25% 50% 75% count max mean min \ a b a b a b a b a b a b a 0 4.75 3.0 6.0 3.0 6.0 3.00 4.0 4.0 6.0 3.0 4.75 3.00 1.0 1 2.00 4.0 2.0 6.5 2.0 9.75 4.0 4.0 2.0 12.0 2.00 7.25 2.0 2 6.00 5.0 7.0 5.0 8.0 5.00 4.0 4.0 11.0 5.0 7.00 5.00 3.0 std b a b 0 3.0 2.500000 0.000000 1 4.0 0.000000 3.947573 2 5.0 3.265986 0.000000