Is there a function in pandas like cumsum() but for the mean? I need to apply it based on a condition
I need to extract the cumulative mean only while my column A is different form zero. Each time it is zero, the cumulative mean should restart. Thanks so much in advance I am not so good using python.
Input:
ColumnA
0 5
1 6
2 7
3 0
4 0
5 1
6 2
7 3
8 0
9 5
10 10
11 15
Expected Output:
ColumnA CumulativeMean
0 5 5.0
1 6 5.5
2 7 6.0
3 0 0.0
4 0 0.0
5 1 1.0
6 2 1.5
7 3 2.0
8 0 0.0
9 5 5.0
10 10 7.5
11 15 10.0
2 answers

You can try with
cumsum
to make groups and then withexpanding
+mean
to make the cumulative meangroups=df.ColumnA.eq(0).cumsum() df.groupby(groups).apply(lambda x: x[x.ne(0)].expanding().mean()).fillna(0)
Details:
Make groups when column is equal to 0 with
eq
andcumsum
, sinceeq
gives you a mask with True and False values, and withcumsum
these values are taken as 1 or 0:groups=df.ColumnA.eq(0).cumsum() groups 0 0 1 0 2 0 3 1 4 2 5 2 6 2 7 2 8 3 9 3 10 3 11 3 Name: ColumnA, dtype: int32
Then group by that groups and use apply to do the cumulative mean over elements different to 0:
df.groupby(groups).apply(lambda x: x[x.ne(0)].expanding().mean()) ColumnA 0 5.0 1 5.5 2 6.0 3 NaN 4 NaN 5 1.0 6 1.5 7 2.0 8 NaN 9 5.0 10 7.5 11 10.0
And finally use fillna to fill with 0 the nan values:
df.groupby(groups).apply(lambda x: x[x.ne(0)].expanding().mean()).fillna(0) ColumnA 0 5.0 1 5.5 2 6.0 3 0.0 4 0.0 5 1.0 6 1.5 7 2.0 8 0.0 9 5.0 10 7.5 11 10.0

You can use boolean indexing to compare rows that are
==0
and!=0
against the previous rows with.shift()
. Then, jsut take the.cumsum()
to separate into groups, according to where zeros are withinColumnA
.df['CumulativeMean'] = (df.groupby((((df.shift()['ColumnA'] != 0) & (df['ColumnA'] == 0))  (df.shift()['ColumnA'] == 0) & (df['ColumnA'] != 0)) .cumsum())['ColumnA'].apply(lambda x: x.expanding().mean())) Out[6]: ColumnA CumulativeMean 0 5 5.0 1 6 5.5 2 7 6.0 3 0 0.0 4 0 0.0 5 1 1.0 6 2 1.5 7 3 2.0 8 0 0.0 9 5 5.0 10 10 7.5 11 15 10.0
I'll have broken down the logic of the
boolean indexing
within the.groupby
statement down into multiple columns that build into the final result of the columnabcd_cumsum
. From there,['ColumnA'].apply(lambda x: x.expanding().mean()))
takes the mean of the group up to any given row in that group. For example, The second row (index of 1) takes the grouped mean of the first and second row, but excludes the third row.df['a'] = (df.shift()['ColumnA'] != 0) df['b'] = (df['ColumnA'] == 0) df['ab'] = (df['a'] & df['b']) df['c'] = (df.shift()['ColumnA'] == 0) df['d'] = (df['ColumnA'] != 0) df['cd'] = (df['c'] & df['d']) df['abcd'] = (df['ab']  df['cd']) df['abcd_cumsum'] = (df['ab']  df['cd']).cumsum() df['CumulativeMean'] = (df.groupby(df['abcd_cumsum'])['ColumnA'].apply(lambda x: x.expanding().mean())) Out[7]: ColumnA a b ab c d cd abcd abcd_cumsum \ 0 5 True False False False True False False 0 1 6 True False False False True False False 0 2 7 True False False False True False False 0 3 0 True True True False False False True 1 4 0 False True False True False False False 1 5 1 False False False True True True True 2 6 2 True False False False True False False 2 7 3 True False False False True False False 2 8 0 True True True False False False True 3 9 5 False False False True True True True 4 10 10 True False False False True False False 4 11 15 True False False False True False False 4 CumulativeMean 0 5.0 1 5.5 2 6.0 3 0.0 4 0.0 5 1.0 6 1.5 7 2.0 8 0.0 9 5.0 10 7.5 11 10.0