Generating Dictionary of DF's by Slicing DF based on Common Element

I have a data frame where the column names share a common element, other columns have been generated with a suffix to this common element. I have a list of these elements that is around ~100 entries. I'd like to iteratively slice the large df using this list, transform the sub-df's by grouping and eventually concatenate them back together.

I was thinking of using a dictionary approach-- using the list as keys, and then defining the columns sharing this element as values. I am not sure how to implement this. I have copied a simplified version to illustrate what I'd like to scale up. In reality there'd be around 100 keys each with 20 associated columns.

   A A_1 A_2 A_3  B B_1 B_2 B_3
0  1   e   f   g  1   x   y   z
1  2   e   f   g  2   x   y   z
2  3   e   f   g  3   x   y   z
3  3   e   f   g  3   x   y   z
4  3   e   f   g  4   x   y   z
5  3   e   f   g  4   x   y   z

df_list = ['A','B']

df_A = df[df.columns[df.columns.to_series().str.contains('A')]]

df_B = df[df.columns[df.columns.to_series().str.contains('B')]]

calc_A = df_A.groupby(['A']).head(1)
print(calc_A)

   A A_1 A_2 A_3
0  1   e   f   g
1  2   e   f   g
2  3   e   f   g


calc_B = df_B.groupby(['B']).head(1)
print(calc_B)

   B B_1 B_2 B_3
0  1   x   y   z
1  2   x   y   z
2  3   x   y   z
4  4   x   y   z

Please advise how to structure this dictionary, iterating through the list to slice the df and assign columns sharing the key as values for the new sub-df. Thank you.

1 answer

  • answered 2017-11-15 00:02 cᴏʟᴅsᴘᴇᴇᴅ

    IIUC, you can group on column prefixes, and then initialise a dictionary:

    d = {}
    for i, g in df.groupby(by=lambda x: x.split('_')[0], axis=1):
        d[i] = g.groupby(i).head(1)
    

    You could also do this using a dict comprehension:

    d = {
            i : g.groupby(i).head(1) 
            for (i, g) in df.groupby(by=lambda x: x.split('_')[0], axis=1)
    }
    
    for k, v in d.items():
        print(v, '\n')
    
       A A_1 A_2 A_3
    0  1   e   f   g
    1  2   e   f   g
    2  3   e   f   g 
    
       B B_1 B_2 B_3
    0  1   x   y   z
    1  2   x   y   z
    2  3   x   y   z
    4  4   x   y   z 
    
    d.keys()
    dict_keys(['A', 'B'])