How to determine column value based on lowest value from second column, grouped by first column?

I have the following dataframe with repeating values in 'nearest_beacon' column, but different distances in 'vms_distance' column:

nearest_beacon  vms_distance associated
2890231      0.421313        vms
2890231      0.215785        vms
2890231      0.104256        vms*
4548780      0.486456        vms
4548780      0.468065        vms
4548780      0.337609        vms
4548780      0.363601        vms
4548780      0.210566        vms
4548780      0.197327        vms*
4548780      0.285390        vms
4548780      0.216443        vms
1221421      0.441454        vms
1221421      0.337533        vms*

I want to determine the 'associated' column for the one row (*) in each 'nearest_beacon' value with the minimum value in 'vms_distance' column, set that 'associated' to 'vms', and the rest to 'no_vms'.

Expected Result:

nearest_beacon  vms_distance associated
2890231      0.421313        no_vms
2890231      0.215785        no_vms
2890231      0.104256        vms
4548780      0.486456        no_vms
4548780      0.468065        no_vms
4548780      0.337609        no_vms
4548780      0.363601        no_vms
4548780      0.210566        no_vms
4548780      0.197327        vms
4548780      0.285390        no_vms
4548780      0.216443        no_vms
1221421      0.441454        no_vms
1221421      0.337533        vms

2 answers

  • answered 2019-08-13 03:37 YO and BEN_W

    Using groupby with idxmin then assign it back via loc

    df.loc[df.groupby('nearest_beacon').vms_distance.idxmin(),'associated']='no vms'
    

  • answered 2019-08-13 04:25 moys

    Try this

    #First we change the lowest items in each group to 'VMS'
    df.loc[df.groupby('nearest_beacon').vms_distance.idxmin(),'associated']='VMS'
    
    #Then we chagne the remaining items to 'No_Vms'
    df.loc[df['associated'] != 'VMS', 'associated'] = 'No_vms'