Add sum columns back to pandas dataframe through chain style
I am trying to add a list of column sums back to my pandas dataframe through assign()
. But I am not so sure how to do it when there's more than one column. What's the best way to do it or any other way to do it in a chain style given I have had other steps before?
data2.assign(data2[rate_name].abs() / data2.groupby(level = 'date')[rate_name].transform('sum'))
rate_water rate_fire rate_wood
id date
apple 2019-01-01 -0.500000 -0.500000 0.000000
orange 2019-01-01 -0.636364 -0.963636 3.000000
melon 2019-01-01 -0.333333 5.666667 27.888889
apple 2020-01-01 -0.100000 7.900000 76.000000
orange 2020-01-01 0.363636 -0.963636 26.500000
melon 2020-01-01 0.166667 6.166667 27.235043
apple 2021-01-01 0.328571 26.261702 84.220779
orange 2021-01-01 0.363636 28.036364 28.683673
melon 2021-01-01 0.611111 39.944444 27.679487
Reproducible:
from pandas import Timestamp
data2 = pd.DataFrame.from_dict({'rate_water': {('apple', Timestamp('2019-01-01 00:00:00')): -0.5, ('orange', Timestamp('2019-01-01 00:00:00')): -0.6363636363636364, ('melon', Timestamp('2019-01-01 00:00:00')): -0.33333333333333337, ('apple', Timestamp('2020-01-01 00:00:00')): -0.10000000000000009, ('orange', Timestamp('2020-01-01 00:00:00')): 0.36363636363636365, ('melon', Timestamp('2020-01-01 00:00:00')): 0.16666666666666663, ('apple', Timestamp('2021-01-01 00:00:00')): 0.3285714285714285, ('orange', Timestamp('2021-01-01 00:00:00')): 0.36363636363636365, ('melon', Timestamp('2021-01-01 00:00:00')): 0.611111111111111}, 'rate_fire': {('apple', Timestamp('2019-01-01 00:00:00')): -0.5, ('orange', Timestamp('2019-01-01 00:00:00')): -0.9636363636363636, ('melon', Timestamp('2019-01-01 00:00:00')): 5.666666666666667, ('apple', Timestamp('2020-01-01 00:00:00')): 7.9, ('orange', Timestamp('2020-01-01 00:00:00')): -0.9636363636363636, ('melon', Timestamp('2020-01-01 00:00:00')): 6.166666666666667, ('apple', Timestamp('2021-01-01 00:00:00')): 26.261702127659575, ('orange', Timestamp('2021-01-01 00:00:00')): 28.036363636363635, ('melon', Timestamp('2021-01-01 00:00:00')): 39.94444444444444}, 'rate_wood': {('apple', Timestamp('2019-01-01 00:00:00')): 0.0, ('orange', Timestamp('2019-01-01 00:00:00')): 3.0, ('melon', Timestamp('2019-01-01 00:00:00')): 27.88888888888889, ('apple', Timestamp('2020-01-01 00:00:00')): 76.0, ('orange', Timestamp('2020-01-01 00:00:00')): 26.5, ('melon', Timestamp('2020-01-01 00:00:00')): 27.235042735042736, ('apple', Timestamp('2021-01-01 00:00:00')): 84.22077922077922, ('orange', Timestamp('2021-01-01 00:00:00')): 28.683673469387756, ('melon', Timestamp('2021-01-01 00:00:00')): 27.67948717948718}})
rate_water rate_fire rate_wood sum_water sum_fire sum_wood
id date
apple 2019-01-01 -0.500000 -0.500000 0.000000 -1.469697 4.20303 30.888889
orange 2019-01-01 -0.636364 -0.963636 3.000000 -1.469697 4.20303 30.888889
melon 2019-01-01 -0.333333 5.666667 27.888889 -1.469697 4.20303 30.888889
apple 2020-01-01 -0.100000 7.900000 76.000000 0.430303 13.10303 129.735043
orange 2020-01-01 0.363636 -0.963636 26.500000 0.430303 13.10303 129.735043
melon 2020-01-01 0.166667 6.166667 27.235043 0.430303 13.10303 129.735043
apple 2021-01-01 0.328571 26.261702 84.220779 1.303319 94.24251 140.583940
orange 2021-01-01 0.363636 28.036364 28.683673 1.303319 94.24251 140.583940
melon 2021-01-01 0.611111 39.944444 27.679487 1.303319 94.24251 140.583940
2 answers
-
answered 2022-05-04 09:43
jezrael
Use dictionary comprehension for dict of Series and add it to dataFrame with unpack
**
:from pandas import Timestamp data2 = pd.DataFrame.from_dict({'rate_water': {('apple', Timestamp('2019-01-01 00:00:00')): -0.5, ('orange', Timestamp('2019-01-01 00:00:00')): -0.6363636363636364, ('melon', Timestamp('2019-01-01 00:00:00')): -0.33333333333333337, ('apple', Timestamp('2020-01-01 00:00:00')): -0.10000000000000009, ('orange', Timestamp('2020-01-01 00:00:00')): 0.36363636363636365, ('melon', Timestamp('2020-01-01 00:00:00')): 0.16666666666666663, ('apple', Timestamp('2021-01-01 00:00:00')): 0.3285714285714285, ('orange', Timestamp('2021-01-01 00:00:00')): 0.36363636363636365, ('melon', Timestamp('2021-01-01 00:00:00')): 0.611111111111111}, 'rate_fire': {('apple', Timestamp('2019-01-01 00:00:00')): -0.5, ('orange', Timestamp('2019-01-01 00:00:00')): -0.9636363636363636, ('melon', Timestamp('2019-01-01 00:00:00')): 5.666666666666667, ('apple', Timestamp('2020-01-01 00:00:00')): 7.9, ('orange', Timestamp('2020-01-01 00:00:00')): -0.9636363636363636, ('melon', Timestamp('2020-01-01 00:00:00')): 6.166666666666667, ('apple', Timestamp('2021-01-01 00:00:00')): 26.261702127659575, ('orange', Timestamp('2021-01-01 00:00:00')): 28.036363636363635, ('melon', Timestamp('2021-01-01 00:00:00')): 39.94444444444444}, 'rate_wood': {('apple', Timestamp('2019-01-01 00:00:00')): 0.0, ('orange', Timestamp('2019-01-01 00:00:00')): 3.0, ('melon', Timestamp('2019-01-01 00:00:00')): 27.88888888888889, ('apple', Timestamp('2020-01-01 00:00:00')): 76.0, ('orange', Timestamp('2020-01-01 00:00:00')): 26.5, ('melon', Timestamp('2020-01-01 00:00:00')): 27.235042735042736, ('apple', Timestamp('2021-01-01 00:00:00')): 84.22077922077922, ('orange', Timestamp('2021-01-01 00:00:00')): 28.683673469387756, ('melon', Timestamp('2021-01-01 00:00:00')): 27.67948717948718}}) data2.index.names=['id','date'] cols = ['rate_water','rate_fire','rate_wood'] data2 = data2.assign(**{rate_name.replace('rate','sum'): data2[rate_name].abs() / data2.groupby(level = 'date')[rate_name].transform('sum') for rate_name in cols}) print (data2) rate_water rate_fire rate_wood sum_water sum_fire \ id date apple 2019-01-01 -0.500000 -0.500000 0.000000 -0.340206 0.118962 orange 2019-01-01 -0.636364 -0.963636 3.000000 -0.432990 0.229272 melon 2019-01-01 -0.333333 5.666667 27.888889 -0.226804 1.348234 apple 2020-01-01 -0.100000 7.900000 76.000000 0.232394 0.602914 orange 2020-01-01 0.363636 -0.963636 26.500000 0.845070 0.073543 melon 2020-01-01 0.166667 6.166667 27.235043 0.387324 0.470629 apple 2021-01-01 0.328571 26.261702 84.220779 0.252104 0.278661 orange 2021-01-01 0.363636 28.036364 28.683673 0.279008 0.297492 melon 2021-01-01 0.611111 39.944444 27.679487 0.468888 0.423847 sum_wood id date apple 2019-01-01 0.000000 orange 2019-01-01 0.097122 melon 2019-01-01 0.902878 apple 2020-01-01 0.585809 orange 2020-01-01 0.204262 melon 2020-01-01 0.209928 apple 2021-01-01 0.599078 orange 2021-01-01 0.204032 melon 2021-01-01 0.196889
Another way is processing all columns together:
cols = ['rate_water','rate_fire','rate_wood'] data2 = data2.join(data2[cols].abs().div(data2.groupby(level = 'date')[cols].transform('sum') ) .rename(columns=lambda x: x.replace('rate','sum')))
cols = ['rate_water','rate_fire','rate_wood'] data2 = data2.assign(**data2[cols].abs().div(data2.groupby(level = 'date')[cols].transform('sum') ) .rename(columns=lambda x: x.replace('rate','sum'))) print (data2) rate_water rate_fire rate_wood sum_water sum_fire \ id date apple 2019-01-01 -0.500000 -0.500000 0.000000 -0.340206 0.118962 orange 2019-01-01 -0.636364 -0.963636 3.000000 -0.432990 0.229272 melon 2019-01-01 -0.333333 5.666667 27.888889 -0.226804 1.348234 apple 2020-01-01 -0.100000 7.900000 76.000000 0.232394 0.602914 orange 2020-01-01 0.363636 -0.963636 26.500000 0.845070 0.073543 melon 2020-01-01 0.166667 6.166667 27.235043 0.387324 0.470629 apple 2021-01-01 0.328571 26.261702 84.220779 0.252104 0.278661 orange 2021-01-01 0.363636 28.036364 28.683673 0.279008 0.297492 melon 2021-01-01 0.611111 39.944444 27.679487 0.468888 0.423847 sum_wood id date apple 2019-01-01 0.000000 orange 2019-01-01 0.097122 melon 2019-01-01 0.902878 apple 2020-01-01 0.585809 orange 2020-01-01 0.204262 melon 2020-01-01 0.209928 apple 2021-01-01 0.599078 orange 2021-01-01 0.204032 melon 2021-01-01 0.196889
-
answered 2022-05-04 09:52
sammywemmy
One option is with the assign method and unpacking:
data2.assign(**data2 .abs() .div( data2.groupby('date') .transform('sum')) .rename(columns = lambda df: df.removeprefix('rate_')) .add_prefix('sum_')) rate_water rate_fire rate_wood sum_water sum_fire sum_wood id date apple 2019-01-01 -0.500000 -0.500000 0.000000 -0.340206 0.118962 0.000000 orange 2019-01-01 -0.636364 -0.963636 3.000000 -0.432990 0.229272 0.097122 melon 2019-01-01 -0.333333 5.666667 27.888889 -0.226804 1.348234 0.902878 apple 2020-01-01 -0.100000 7.900000 76.000000 0.232394 0.602914 0.585809 orange 2020-01-01 0.363636 -0.963636 26.500000 0.845070 0.073543 0.204262 melon 2020-01-01 0.166667 6.166667 27.235043 0.387324 0.470629 0.209928 apple 2021-01-01 0.328571 26.261702 84.220779 0.252104 0.278661 0.599078 orange 2021-01-01 0.363636 28.036364 28.683673 0.279008 0.297492 0.204032 melon 2021-01-01 0.611111 39.944444 27.679487 0.468888 0.423847 0.196889
Another option would be to concatenate along axis=1:
pd.concat([data2, data2.abs() .div(groupby('date') .transform('sum')) .rename(columns = lambda df: df.removeprefix('rate_')) .add_prefix('sum_')], axis = 1) Out[103]: rate_water rate_fire rate_wood sum_water sum_fire sum_wood id date apple 2019-01-01 -0.500000 -0.500000 0.000000 -0.340206 0.118962 0.000000 orange 2019-01-01 -0.636364 -0.963636 3.000000 -0.432990 0.229272 0.097122 melon 2019-01-01 -0.333333 5.666667 27.888889 -0.226804 1.348234 0.902878 apple 2020-01-01 -0.100000 7.900000 76.000000 0.232394 0.602914 0.585809 orange 2020-01-01 0.363636 -0.963636 26.500000 0.845070 0.073543 0.204262 melon 2020-01-01 0.166667 6.166667 27.235043 0.387324 0.470629 0.209928 apple 2021-01-01 0.328571 26.261702 84.220779 0.252104 0.278661 0.599078 orange 2021-01-01 0.363636 28.036364 28.683673 0.279008 0.297492 0.204032 melon 2021-01-01 0.611111 39.944444 27.679487 0.468888 0.423847 0.196889
I feel though that it would probably be cleaner to create temporary variables instead, especially if this is going to be in production code
How many English words
do you know?
do you know?
Test your English vocabulary size, and measure
how many words do you know
Online Test
how many words do you know
Powered by Examplum