counting consequtive elements in a dataframe and storing them in a new column

so i have this code:

import pandas as pd
id_1=[0,0,0,0,0,0,2,0,4,5,6,7,1,0,5,3]
exp_1=[1,2,3,4,5,6,1,7,1,1,1,1,1,8,2,1]

df = pd.DataFrame(list(zip(id_1,exp_1)), columns =['Patch', 'Exploit'])              
            
df = (
     df.groupby((df.Patch != df.Patch.shift(1)).cumsum())
     .agg({"Patch": ("first", "count")})
     .reset_index(drop=True)
      
    ) 
print(df)

the output is:

   Patch      
   first count
0      0     6
1      2     1
2      0     1
3      4     1
4      5     1
5      6     1
6      7     1
7      1     1
8      0     1
9      5     1
10     3     1

I wanted to create a data frame with a new column called count where I can store the consecutive appearance of the patch (id_1). However, the above code creates a dictionary of the patch and I don't know how to individually manipulate only the values stored in the column called count.

suppose I want to remove all the 0 from id_1 and then count the consecutive appearance. or I have to find the average of the count column only then?

1 answer

  • answered 2021-05-14 20:44 Andrej Kesely

    If you want to remove all 0 from column Patch, then you can filter the dataframe just before .groupby. For example:

    df = (
        df[df.Patch != 0]
        .groupby((df.Patch != df.Patch.shift(1)).cumsum())
        .agg({"Patch": ("first", "count")})
        .reset_index(drop=True)
    )
    print(df)
    

    Prints:

      Patch      
      first count
    0     2     1
    1     4     1
    2     5     1
    3     6     1
    4     7     1
    5     1     1
    6     5     1
    7     3     1