I want to calculate the percentage but all i am getting is the sum in pandas data frame
I want to calculate the percentage but all i am getting is the sum . Please help me get the percentage value in the cells rather than the count in python in pandas data frame .
Code :
ds_data = data[(data.JobTitle == 'Data Analyst')  (data.JobTitle == 'Data Engineer')  (data.JobTitle == 'Data Scientist')]
agg_func = {'Education':{'Masters': lambda x: \
sum(i == 'Masters' for i in x),
'Bachelor': lambda x : sum(i == 'Bachelors (4 years)' for i in x),
'None': lambda x : sum(i == 'None (no degree completed)' for i in x),
'Doctorates': lambda x : sum(i == 'Doctorate/PhD' for i in x),
'Associates': lambda x : sum(i == 'Associates (2 years)' for i in x)}}
function = ds_data.groupby(['JobTitle']).agg(agg_func).reset_index()
function.columns = function.columns.droplevel(0)
function
2 answers

I've taken the liberty to define a function to contain the math, since it is cleaner than copy/pasting the code.
In order to get the percentage, you need to divide by the total number, or the length of the list.
def calc_percentage(data, degree): return (sum(i == degree for i in x) / len(x)) * 100 agg_func = { 'Education': { 'Masters': lambda x : calc_percentage(x, 'Masters'), 'Bachelor': lambda x : calc_percentage(x, 'Bachelors (4 years)'), 'None': lambda x : calc_percentage(x, 'None (no degree completed)'), 'Doctorates': lambda x : calc_percentage(x, 'Doctorate/PhD'), 'Associates': lambda x : calc_percentage(x, 'Associates (2 years)') } }

If we use the dict renaming (which is deprecated), one can compute the total amount of rows, and then using it in the lambda functions to get the percentage:
ds_data = data[(data.JobTitle == 'Data Analyst')  (data.JobTitle == 'Data Engineer')  (data.JobTitle == 'Data Scientist')] ds_data_nrows = ds_data.shape[0] agg_func = {'Education':{'Masters': lambda x: \ (sum(i == 'Masters' for i in x) / ds_data_nrows) * 100, 'Bachelor': lambda x : (sum(i == 'Bachelors (4 years)' for i in x) / ds_data_nrows) * 100, 'None': lambda x : (sum(i == 'None (no degree completed)' for i in x) / ds_data_nrows) * 100, 'Doctorates': lambda x : (sum(i == 'Doctorate/PhD' for i in x) / ds_data_nrows) * 100, 'Associates': lambda x : (sum(i == 'Associates (2 years)' for i in x) / ds_data_nrows) * 100}} function = ds_data.groupby(['JobTitle']).agg(agg_func).reset_index() function.columns = function.columns.droplevel(0) function