Parsing a Dataframe multiple ways

I'm trying to blow out the below dataset, so that every email address is a line item, including all the columns.

    PrimaryEmail     ShipToEmail       SoldToEmail    City
0            None   Maine
1            None              None  Jersey
2           None    Blah

The results should be:

    PrimaryEmail     ShipToEmail       SoldToEmail    City             Role             Email                  UserID
0            None   Maine  Primary Contact     jack@jill.com__user
1            None              None  Jersey  Primary Contact     jill@jack.com__user
2           None    Blah  Ship-To Contact    frank@tony.com__user
3            None   Maine  Sold-To Contact  bridge@white.com__user
4           None    Blah  Sold-To Contact   listen@what.com__user

I've created the function below which works great for a small set of data (like above), however when I get into 250K+ lines it will do the first function call fine, i.e.:

user_modifications(df.copy(deep=True),'PrimaryEmail','Primary Contact')

However, the rest of the function calls, for the other roles, are returning line items where the email address for that function are blank.

import pandas as pd

def user_modifications(__dataframe,__emailcolumn, __Role):
    __df = __dataframe[__dataframe[__emailcolumn].notnull()].copy(deep=True)
    __df['Role'] = __Role
    __df['Email'] = __df[__emailcolumn]
    __df['UserID'] = __df[__emailcolumn] + '__user'

def main():
    df=pd.DataFrame({'PrimaryEmail':['','',None],'ShipToEmail':[None,None,''],'SoldToEmail':['',None,''], 'City':['Maine','Jersey','Blah']})
    df_users = pd.concat([user_modifications(df.copy(deep=True),'PrimaryEmail','Primary Contact'),user_modifications(df.copy(deep=True),'ShipToEmail','Ship-To Contact'),user_modifications(df.copy(deep=True),'SoldToEmail','Sold-To Contact')], ignore_index=True)


Thanks for the help.

2 answers

  • answered 2020-09-14 05:36 Shaido

    It looks like you are doing some additional copying of the dataframe. You can try as simple as possible, split the dataframe depending on the values in the first 3 columns, add the Role and Email columns and the concatenate everything together. Finally, the UserID column can be added.

    primary_emails = df.loc[df['PrimaryEmail'].notnull()].copy()
    primary_emails['Role'] = 'Primary Contact'
    primary_emails['Email'] = df['PrimaryEmail']
    ship_emails = df.loc[df['ShipToEmail'].notnull()].copy()
    ship_emails['Role'] = 'Ship-To Contact'
    ship_emails['Email'] = df['ShipToEmail']
    sold_emails = df.loc[df['SoldToEmail'].notnull()].copy()
    sold_emails['Role'] = 'Sold-To Contact'
    sold_emails['Email'] = df['SoldToEmail']
    df = pd.concat([primary_emails, ship_emails, sold_emails], axis=0, ignore_index=True)
    df['UserId'] = df['Email'] + '__user'

    This will give the wanted resulting dataframe.

  • answered 2020-09-14 05:51 Chris

    One way using pandas.DataFrame.filter, melt and merge:

    df = df["None")]
    df2 = df.filter(like="Email").reset_index().melt(id_vars="index", 
    new_df = df.merge(df2, left_index=True, right_on="index").drop("index", 1)


        PrimaryEmail     ShipToEmail       SoldToEmail    City          Role  \
    0             NaN   Maine  PrimaryEmail   
    6             NaN   Maine   SoldToEmail   
    1             NaN               NaN  Jersey  PrimaryEmail   
    5            NaN    Blah   ShipToEmail   
    8            NaN    Blah   SoldToEmail   

    Then modify the new columns as desired:

    new_df["Role"] = new_df["Role"].str.replace("Email", " Contact")
    new_df["UserID"] = new_df["Email"] + "__user"


        PrimaryEmail     ShipToEmail       SoldToEmail    City             Role  \
    0             NaN   Maine  Primary Contact   
    6             NaN   Maine   SoldTo Contact   
    1             NaN               NaN  Jersey  Primary Contact   
    5            NaN    Blah   ShipTo Contact   
    8            NaN    Blah   SoldTo Contact   
                  Email                  UserID  
    0     jack@jill.com__user  
    6  bridge@white.com__user  
    1     jill@jack.com__user  
    5    frank@tony.com__user  
    8   listen@what.com__user