Python: Count combinations of values within two columns and find max frequency of each combination

My pandas dataframe looks like this:

+-----+---------+-------+
| No. | Section | Group |
+-----+---------+-------+
| 123 |     222 |     1 |
| 234 |     222 |     1 |
| 345 |     222 |     1 |
| 456 |     222 |     3 |
| 567 |     241 |     1 |
| 678 |     241 |     2 |
| 789 |     241 |     2 |
| 890 |     241 |     3 |
+-----+---------+-------+

First, I need to add another column containing the frequency of each combination of Section and Group. It is important to keep all rows.

Desired output:

+-----+---------+-------+-------+
| No. | Section | Group | Count |
+-----+---------+-------+-------+
| 123 |     222 |     1 |     3 |
| 234 |     222 |     1 |     3 |
| 345 |     222 |     1 |     3 |
| 456 |     222 |     3 |     1 |
| 567 |     241 |     1 |     1 |
| 678 |     241 |     2 |     2 |
| 789 |     241 |     2 |     2 |
| 890 |     241 |     3 |     1 |
+-----+---------+-------+-------+

The second step would be marking the highest value within Count for each Section. For example, with a True/False column like this:

+-----+---------+-------+-------+-------+
| No. | Section | Group | Count |  Max  |
+-----+---------+-------+-------+-------+
| 123 |     222 |     1 |     3 | True  |
| 234 |     222 |     1 |     3 | True  |
| 345 |     222 |     1 |     3 | True  |
| 456 |     222 |     3 |     1 | False |
| 567 |     241 |     1 |     1 | False |
| 678 |     241 |     2 |     2 | True  |
| 789 |     241 |     2 |     2 | True  |
| 890 |     241 |     3 |     1 | False |
+-----+---------+-------+-------+-------+

The original data frame has lots of rows. That is why I'm asking for an efficient way because I cannot think of one.

Thank you very much!

1 answer

  • answered 2018-03-13 22:04 W-B

    Look at transform

    df['Count']=df.groupby(['Section','Group']).Group.transform('size')
    df['Max']=df.groupby(['Section'])['Count'].transform('max')==df['Count']
    df
    Out[508]: 
        No  Section  Group  Count    Max
    0  123      222      1      3   True
    1  234      222      1      3   True
    2  345      222      1      3   True
    3  456      222      3      1  False
    4  567      241      1      1  False
    5  678      241      2      2   True
    6  789      241      2      2   True
    7  890      241      3      1  False