I want to calculate the percentage but all i am getting is the sum in pandas data frame

I want to calculate the percentage but all i am getting is the sum . Please help me get the percentage value in the cells rather than the count in python in pandas data frame .

Code :

ds_data = data[(data.JobTitle == 'Data Analyst') | (data.JobTitle == 'Data Engineer')  | (data.JobTitle == 'Data Scientist')]
agg_func = {'Education':{'Masters': lambda x: \
    sum(i == 'Masters' for i in x),
    'Bachelor': lambda x : sum(i == 'Bachelors (4 years)' for i in x),
    'None': lambda x : sum(i == 'None (no degree completed)' for i in x),
    'Doctorates': lambda x : sum(i == 'Doctorate/PhD' for i in x),
    'Associates': lambda x : sum(i == 'Associates (2 years)' for i in x)}}
function = ds_data.groupby(['JobTitle']).agg(agg_func).reset_index()
function.columns = function.columns.droplevel(0)
function

2 answers

  • answered 2020-01-14 01:42 Jonhasacat

    I've taken the liberty to define a function to contain the math, since it is cleaner than copy/pasting the code.

    In order to get the percentage, you need to divide by the total number, or the length of the list.

    def calc_percentage(data, degree):
      return (sum(i == degree for i in x) / len(x)) * 100
    
    agg_func = {
        'Education': {
            'Masters': lambda x : calc_percentage(x, 'Masters'),
            'Bachelor': lambda x : calc_percentage(x, 'Bachelors (4 years)'),
            'None': lambda x : calc_percentage(x, 'None (no degree completed)'),
            'Doctorates': lambda x : calc_percentage(x, 'Doctorate/PhD'),
            'Associates': lambda x : calc_percentage(x, 'Associates (2 years)')
        }
    }
    

  • answered 2020-01-14 01:42 datapug

    If we use the dict renaming (which is deprecated), one can compute the total amount of rows, and then using it in the lambda functions to get the percentage:

    ds_data = data[(data.JobTitle == 'Data Analyst') | (data.JobTitle == 'Data Engineer') 
                   | (data.JobTitle == 'Data Scientist')]
    ds_data_nrows = ds_data.shape[0]
    agg_func = {'Education':{'Masters': lambda x: \
        (sum(i == 'Masters' for i in x) / ds_data_nrows) * 100,
        'Bachelor': lambda x : (sum(i == 'Bachelors (4 years)' for i in x) / ds_data_nrows) * 100,
        'None': lambda x : (sum(i == 'None (no degree completed)' for i in x) / ds_data_nrows) * 100,
        'Doctorates': lambda x : (sum(i == 'Doctorate/PhD' for i in x) / ds_data_nrows) * 100,
        'Associates': lambda x : (sum(i == 'Associates (2 years)' for i in x) / ds_data_nrows) * 100}}
    function = ds_data.groupby(['JobTitle']).agg(agg_func).reset_index()
    function.columns = function.columns.droplevel(0)
    function