Compute lagged means per name and round in pandas

I need to compute lagged means per groups in my dataframe. This is how my df looks like:

  name value  round
0    a     5      3
1    b     4      3
2    c     3      2
3    d     1      2
4    a     2      1
5    c     1      1
0    c     1      3
1    d     4      3
2    b     3      2
3    a     1      2
4    b     5      1
5    d     2      1

I would like to compute lagged means for column value per name and round. That is, for name a in round 3 I need to have value_mean = 1.5 (because (1+2)/2). And of course, there will be nan values when round = 1.

I tried this:

df['value_mean'] = df.groupby('name').expanding().mean().groupby('name').shift(1)['value'].values

but it gives a nonsense:

  name value  round  value_mean
0    a     5      3         NaN
1    b     4      3         5.0
2    c     3      2         3.5
3    d     1      2         NaN
4    a     2      1         4.0
5    c     1      1         3.5
0    c     1      3         NaN
1    d     4      3         3.0
2    b     3      2         2.0
3    a     1      2         NaN
4    b     5      1         1.0
5    d     2      1         2.5

Any idea, how can I do this, please? I found this, but it seems not relevant for my problem: Calculate the mean value using two columns in pandas

1 answer

  • answered 2020-11-20 12:48 jottbe

    You can do that as follows

    # sort the values as they need to be counted
    df.sort_values(['name', 'round'], inplace=True)
    df.reset_index(drop=True, inplace=True)
    
    # create a grouper to calculate the running count
    # and running sum as the basis of the average
    grouper= df.groupby('name')
    ser_sum=   grouper['value'].cumsum()
    ser_count= grouper['value'].cumcount()+1
    ser_mean= ser_sum.div(ser_count)
    ser_same_name= df['name'] == df['name'].shift(1)
    # finally you just have to set the first entry
    # in each name-group to NaN (this usually would
    # set the entries for each name and round=1 to NaN)
    df['value_mean']= ser_mean.shift(1).where(ser_same_name, np.NaN)
    
    # if you want to see the intermediate products, 
    # you can uncomment the following lines
    #df['sum']= ser_sum
    #df['count']= ser_count
    df
    

    Output:

       name  value  round  value_mean
    0     a      2      1         NaN
    1     a      1      2         2.0
    2     a      5      3         1.5
    3     b      5      1         NaN
    4     b      3      2         5.0
    5     b      4      3         4.0
    6     c      1      1         NaN
    7     c      3      2         1.0
    8     c      1      3         2.0
    9     d      2      1         NaN
    10    d      1      2         2.0
    11    d      4      3         1.5