Count rows across certain columns in a dataframe if they are greater than another value and groupby another column

I have a dataframe:

df = pd.DataFrame({ 
    'BU': ['Total', 'Total', 'Total', 'CRS', 'CRS', 'CRS'], 
    'Line_Item': ['Revenues','EBT', 'Expenses', 'Revenues', 'EBT', 'Expenses'],
    '1Q16': [100, 120, 0, 200, 190, 210],
    '2Q16': [100, 0, 130, 200, 190, 210],
    '3Q16': [200, 250, 0, 120, 0, 190]})

I wish to count the number of rows in 1Q16, 2Q16, 3Q16 by "BU" that are greater than zero. To count rows in 1Q16, 2Q16, 3Q16 I was just explained, I can use:

cols = ['1Q16','2Q16','3Q16']
df[cols].gt(0).sum()

In addition, I want to group them by BU

1 answer

  • answered 2021-05-03 17:49 RavinderSingh13

    With your shown samples, please try following.

    cols = ['1Q16','2Q16','3Q16']
    df[cols].gt(0).groupby(df['BU']).sum()
    

    Output will be as follows:

          1Q16  2Q16    3Q16
    BU          
    CRS   3.0   3.0     2.0
    Total 2.0   2.0     2.0
    

    Explanation: Following is detailed explanation for above.

    • Creating cols list which has columns names in it where we want to perform tasks.
    • Using gt function to get values which are more than 0 in mentioned cols.
    • Then using groupby and passing df['BU'] to get groupby values related to BU column.
    • Then applying sum function to get total sum of values greater than 0.