Generating Dictionary of DF's by Slicing DF based on Common Element
I have a data frame where the column names share a common element, other columns have been generated with a suffix to this common element. I have a list of these elements that is around ~100 entries. I'd like to iteratively slice the large df using this list, transform the subdf's by grouping and eventually concatenate them back together.
I was thinking of using a dictionary approach using the list as keys, and then defining the columns sharing this element as values. I am not sure how to implement this. I have copied a simplified version to illustrate what I'd like to scale up. In reality there'd be around 100 keys each with 20 associated columns.
A A_1 A_2 A_3 B B_1 B_2 B_3
0 1 e f g 1 x y z
1 2 e f g 2 x y z
2 3 e f g 3 x y z
3 3 e f g 3 x y z
4 3 e f g 4 x y z
5 3 e f g 4 x y z
df_list = ['A','B']
df_A = df[df.columns[df.columns.to_series().str.contains('A')]]
df_B = df[df.columns[df.columns.to_series().str.contains('B')]]
calc_A = df_A.groupby(['A']).head(1)
print(calc_A)
A A_1 A_2 A_3
0 1 e f g
1 2 e f g
2 3 e f g
calc_B = df_B.groupby(['B']).head(1)
print(calc_B)
B B_1 B_2 B_3
0 1 x y z
1 2 x y z
2 3 x y z
4 4 x y z
Please advise how to structure this dictionary, iterating through the list to slice the df and assign columns sharing the key as values for the new subdf. Thank you.
1 answer

IIUC, you can group on column prefixes, and then initialise a dictionary:
d = {} for i, g in df.groupby(by=lambda x: x.split('_')[0], axis=1): d[i] = g.groupby(i).head(1)
You could also do this using a dict comprehension:
d = { i : g.groupby(i).head(1) for (i, g) in df.groupby(by=lambda x: x.split('_')[0], axis=1) } for k, v in d.items(): print(v, '\n') A A_1 A_2 A_3 0 1 e f g 1 2 e f g 2 3 e f g B B_1 B_2 B_3 0 1 x y z 1 2 x y z 2 3 x y z 4 4 x y z d.keys() dict_keys(['A', 'B'])