Pandas make aggregated array list into a dictionary

I have the following data structure:

import pandas as pd
import json

df = pd.DataFrame( {'g1' : ['ABC', 'ABC', 'XYZ', 'XYZ'], 'g2' : ['DEF', 'GHI', 'RST', 'UVW']})
print df

>>     g1   g2
0  ABC  DEF
1  ABC  GHI
2  XYZ  RST
3  XYZ  UVW

I'm trying to write JSON files with the following structure

$ cat ABC.json

> {
    "DEF" : true, 
    "GHI" : true
  }  

and

$ cat XYZ.json

> {
    "RST" : true, 
    "UVW" : true
  }  

So far I've been able to create the aggregation

print df.groupby(u'g1',as_index=True)[u'g2'].aggregate(lambda x: set(x))

>> g1
ABC    {GHI, DEF}
XYZ    {RST, UVW}

and dump it to a JSON

dd = json.loads(df.to_json())

and then write it to individual files

for k,v in dd.iteritems():
    with open(k+'json','wb') as fp:
        json.dump(v, fp)

but to map the list to a dict in a pandas-y way is still eluding me. I'll post my python (non-pandas) answer as a reference

2 answers

  • answered 2018-10-22 14:56 philshem

    A non-pandas (and not very pythonic) way to solve this is to loop through first each key (k) and then loop through each array (v) element, creating a dictionary (vd) entry for each member of the array. It works, that's all I can say.

    vd = dict()
    for k,v in dd.iteritems():
        for x in v:
            vd[x] = True
    
        with open(k+'json','wb') as fp:
            json.dump(vd, fp)
    

  • answered 2018-10-22 15:07 Daniel Mesejo

    You could do something like this:

    import json
    import pandas as pd
    
    df = pd.DataFrame({'g1': ['ABC', 'ABC', 'XYZ', 'XYZ'], 'g2': ['DEF', 'GHI', 'RST', 'UVW']})
    
    for name, group in df.groupby('g1'):
        with open('{}.json'.format(name), 'w') as out:
            json.dump(dict.fromkeys(group['g2'].values, True), out)