How do I group my pandas columns to map and create a new column based on map values

I have a data frame of students id along with number of tests they have taken for each subject. I have to group it according to IDs with subjects and number of tests as map.

What I have:

Id     Subject     Number_of_Tests
101    Maths       6
101    Science     8
101    History     10
102    History     5
102    Maths       4
102    Science     7

What I want:

Id     Tests                                  Grade
101    {Maths:6, Science:8, History:10}     A
102    {History:5, Maths:5, Science:7}      B

Also, after grouping like this, I want to add one more column to this called "Grade", which is based on the newly created "Tests" Map Field. For example, if the number of tests in Maths is greater dhan 5 and if the number of tests in Science is greater than 5 and if the number of tests in History is greater than 5, then grade "A", else "B".

Could anyone please help me on this.

3 answers

  • answered 2020-09-14 06:03 David Erickson

    You originally asked your question with the output for Tests being a list. This is the code that would provide that output. I will try to create another version for dict later:

    1. The Tests column is pretty straight forward. First, join the relevant columns into a sting. Later, in the .groupby(), you will aggregate these values as a list.
    2. Create the Grade column by getting the .min() of each group and returning A or B. Later, you can merge this back into the new consolidated dataframe.

    df['Tests'] = df['Subject'] + ': ' + df['Number_of_Tests'].astype(str)
    df['Grade'] = (df.groupby(['Id'])['Number_of_Tests'].transform('min') > 5).replace([True,False], ['A','B'])
    df = pd.merge(df.groupby(['Id'])['Tests'].agg(list).reset_index(),
                  df[['Grade','Id']], on='Id').drop_duplicates(subset='Id')
    df
    Out[1]: 
        Id                                Tests Grade
    0  101  [Maths: 6, Science: 8, History: 10]     A
    3  102   [History: 5, Maths: 4, Science: 7]     B
    

  • answered 2020-09-14 06:15 Sociopath

    I think you need:

    # create a list of both columns on groupby
    new_df = df.groupby('Id', as_index=False).aggregate({"Subject": lambda x: x.to_list(), "Number_Of_Tests": lambda x: x.to_list()})
    
    # create a new column `Grade` based on condition
    new_df["Grade"] = ["A" if all(j>5 for j in i) else "B" for i in new_df["Number_Of_Tests"]]
    
    # create a column Tests using other 2 columns 
    new_df["Tests"] = [{k:v for k,v in zip(i,j)} for i,j in zip(new_df["Subject"], new_df["Number_Of_Tests"])]
    
    # drop unwanted columns
    new_df.drop(["Subject","Number_Of_Tests"], axis=1, inplace=True)
    

    Output:

       Id    Grade     Tests
    0  101     A     {'Maths': 6, 'Science': 8, 'History': 10}
    1  102     B     {'History': 5, 'Maths': 4, 'Science': 7}
    

  • answered 2020-09-14 06:15 Vishnudev

    Change the structure of your dataframe so that it will be easier for manipulation than to a format that is not easily manipulatable like dict.

    df = df.set_index(['Id', 'Subject']).unstack()
    
            Number_of_Tests              
    Subject         History Maths Science
    Id                                   
    101                  10     6       8
    102                   5     4       7
    

    Now Just add a column based on the required condition.

    df['Grade'] = np.where((df['Number_of_Tests'] > 5).all(axis=1), 'A', 'B')
    

    Output

            Number_of_Tests               Grade
    Subject         History Maths Science      
    Id                                         
    101                  10     6       8     A
    102                   5     4       7     B
    

    OR

    If you want the dict anyways,

    df['Tests'] = df.groupby('Id').apply(lambda x: dict(zip(x.Subject, x.Number_of_Tests)))