Grouping strings on the pandas dataframe

I have the following dataframe with information from weather stations:

      import pandas as pd
      import numpy as np

      df = pd.DataFrame({'Code Weather Station': ['1024', '1024', '1024', '2089', 
                                                  '2089', '2089', '8974'], 
                         'Instrumentation': ['Pluviometer-Analog', 'speedometer', 'incidence-sun',
                                             'speedometer', 'Pluviometer', 'speedometer', 
                                             'Pluviometer']})

I would like to group the instruments from each of the weather stations.

I tried to use groupby, along with the sum () function, as follows:

      df_New = df.groupby('Code Weather Station', as_index=False)['Instrumentation'].sum()

The result is as expected. However, I wish there were spaces among the instruments.

      print(df_New)

      Code Weather Station  Instrumentation
            1024             Pluviometer-Analogspeedometerincidence-sun
            2089             speedometerPluviometerspeedometer
            8974             Pluviometer

I would like the output to be:

      Code Weather Station  Instrumentation
            1024             Pluviometer-Analog speedometer incidence-sun
            2089             speedometer Pluviometer speedometer
            8974             Pluviometer

Thank you.

2 answers

  • answered 2020-05-22 12:54 Partha Mandal

    Oh! Do a reset_index() like:

    df.groupby('Code Weather Station')['Instrumentation'].apply(lambda x: ' '.join(x)).reset_index()

  • answered 2020-05-22 12:56 tuhinsharma121

    you should avoid apply as its inefficient. You can try this:-

    import pandas as pd
    import numpy as np
    
    df = pd.DataFrame({'Code Weather Station': ['1024', '1024', '1024', '2089', 
                                              '2089', '2089', '8974'], 
                     'Instrumentation': ['Pluviometer-Analog', 'speedometer', 'incidence-sun',
                                         'speedometer', 'Pluviometer', 'speedometer', 
                                         'Pluviometer']})
    
    def process(x):
        return " ".join(x)
    
    df_new = df.groupby('Code Weather Station').agg({
            'Instrumentation': [('Instrumentation', process)]
        })
    df_new.columns = df_new.columns.droplevel()
    df_new