Is there a function in pandas like cumsum() but for the mean? I need to apply it based on a condition

I need to extract the cumulative mean only while my column A is different form zero. Each time it is zero, the cumulative mean should restart. Thanks so much in advance I am not so good using python.

Input:

    ColumnA
0   5
1   6
2   7
3   0
4   0
5   1
6   2
7   3
8   0
9   5
10  10
11  15

Expected Output:

    ColumnA CumulativeMean
0   5       5.0
1   6       5.5
2   7       6.0
3   0       0.0
4   0       0.0
5   1       1.0
6   2       1.5
7   3       2.0
8   0       0.0
9   5       5.0
10  10      7.5
11  15      10.0

2 answers

  • answered 2020-08-11 04:17 MrNobody33

    You can try with cumsum to make groups and then with expanding+mean to make the cumulative mean

    groups=df.ColumnA.eq(0).cumsum()
    df.groupby(groups).apply(lambda x: x[x.ne(0)].expanding().mean()).fillna(0)
    

    Details:

    Make groups when column is equal to 0 with eq and cumsum, since eq gives you a mask with True and False values, and with cumsum these values are taken as 1 or 0:

    groups=df.ColumnA.eq(0).cumsum() 
    groups
    0     0
    1     0
    2     0
    3     1
    4     2
    5     2
    6     2
    7     2
    8     3
    9     3
    10    3
    11    3
    Name: ColumnA, dtype: int32
    

    Then group by that groups and use apply to do the cumulative mean over elements different to 0:

    df.groupby(groups).apply(lambda x: x[x.ne(0)].expanding().mean())
        ColumnA
    0       5.0
    1       5.5
    2       6.0
    3       NaN
    4       NaN
    5       1.0
    6       1.5
    7       2.0
    8       NaN
    9       5.0
    10      7.5
    11     10.0
    

    And finally use fillna to fill with 0 the nan values:

    df.groupby(groups).apply(lambda x: x[x.ne(0)].expanding().mean()).fillna(0)
        ColumnA
    0       5.0
    1       5.5
    2       6.0
    3       0.0
    4       0.0
    5       1.0
    6       1.5
    7       2.0
    8       0.0
    9       5.0
    10      7.5
    11     10.0
    

  • answered 2020-08-11 22:14 David Erickson

    You can use boolean indexing to compare rows that are ==0 and !=0 against the previous rows with .shift(). Then, jsut take the .cumsum() to separate into groups, according to where zeros are within ColumnA.

    df['CumulativeMean'] = (df.groupby((((df.shift()['ColumnA'] != 0) & (df['ColumnA'] == 0)) |
                                         (df.shift()['ColumnA'] == 0) & (df['ColumnA'] != 0))
                              .cumsum())['ColumnA'].apply(lambda x: x.expanding().mean()))
    
    
    Out[6]: 
        ColumnA  CumulativeMean
    0         5             5.0
    1         6             5.5
    2         7             6.0
    3         0             0.0
    4         0             0.0
    5         1             1.0
    6         2             1.5
    7         3             2.0
    8         0             0.0
    9         5             5.0
    10       10             7.5
    11       15            10.0
    

    I'll have broken down the logic of the boolean indexing within the .groupby statement down into multiple columns that build into the final result of the column abcd_cumsum. From there, ['ColumnA'].apply(lambda x: x.expanding().mean())) takes the mean of the group up to any given row in that group. For example, The second row (index of 1) takes the grouped mean of the first and second row, but excludes the third row.

    df['a'] = (df.shift()['ColumnA'] != 0)
    df['b'] = (df['ColumnA'] == 0)
    df['ab'] = (df['a'] & df['b'])
    df['c'] = (df.shift()['ColumnA'] == 0)
    df['d'] = (df['ColumnA'] != 0)
    df['cd'] = (df['c'] & df['d'])
    df['abcd'] = (df['ab'] | df['cd'])
    df['abcd_cumsum'] = (df['ab'] | df['cd']).cumsum()
    df['CumulativeMean'] = (df.groupby(df['abcd_cumsum'])['ColumnA'].apply(lambda x: x.expanding().mean()))
    
    Out[7]: 
        ColumnA      a      b     ab      c      d     cd   abcd  abcd_cumsum  \
    0         5   True  False  False  False   True  False  False            0   
    1         6   True  False  False  False   True  False  False            0   
    2         7   True  False  False  False   True  False  False            0   
    3         0   True   True   True  False  False  False   True            1   
    4         0  False   True  False   True  False  False  False            1   
    5         1  False  False  False   True   True   True   True            2   
    6         2   True  False  False  False   True  False  False            2   
    7         3   True  False  False  False   True  False  False            2   
    8         0   True   True   True  False  False  False   True            3   
    9         5  False  False  False   True   True   True   True            4   
    10       10   True  False  False  False   True  False  False            4   
    11       15   True  False  False  False   True  False  False            4   
    
        CumulativeMean  
    0              5.0  
    1              5.5  
    2              6.0  
    3              0.0  
    4              0.0  
    5              1.0  
    6              1.5  
    7              2.0  
    8              0.0  
    9              5.0  
    10             7.5  
    11            10.0