Sample dataframe using conditions for each row

I'm trying to take samples from a dataframe within a category, above a certain price. Then combine the samples into a string, and assign to the row. I can do it individually, but I want to find a way to apply it to parts of the dataframe.

Selecting the sample individually is possible

df.loc[df['price'] > 20, :].loc[df['catStr'] == 'ME'].sample(n=3)['SKU']
153    MEMR1055
145    MEMR1048
168    MEMR1064

And joining the result into a string

",".join(df.loc[df['price'] > 20, :].loc[df['catStr'] == 'ME'].sample(n=3)['SKU'])
'MEMR1048,MEMR1057,MEMR1051'

I also tried using groupby, and writing a function using apply, but couldn't get these methods to work.

Instead of 20, I want to use 'price' of each row, and instead of 'ME' use 'catStr' of each row. Is there a way to iterate the dataframe and and take samples for each row?

Edit Expected output - each row to be a different sample based on its price

'price' 'catStr' 'SKU' 'Result'
 125.0  IT  ITMR1012  ITMR1024,ITMR1012,ITMR1015
 130.0  CU  CUMR1018  CUMR1009,CUMR1003,CUMR1002
 100.0  MX  MXMR1007  MXMR1006,MXMR1012,MXMR1016
 225.0  ME  MEMR1059  MEMR1018,MEMR1022,MEMR1062
 125.0  IT  ITMR1008  ITMR1022,ITMR1010,ITMR1055

1 answer

  • answered 2019-12-09 09:25 jezrael

    I think you need lambda function with GroupBy.apply:

    df.loc[df['price'] > 20, :].groupby('catStr')['SKU'].apply(lambda x: ",".join(x.sample(n=3)))