Creating dummy variables for multiple cateogorical variables in Python

patient_dummies = pd.get_dummies(df['PatientSerial'], prefix='Serial_', drop_first = True)
df = pd.concat([df, patient_dummies], axis = 1)
df.drop(['PatientSerial'], inplace = True, axis = 1)


machine_dummies = pd.get_dummies(df['MachineID'], drop_first = True)
df = pd.concat([df, machine_dummies], axis = 1)
df.drop(['MachineID'], inplace = True, axis = 1)

I have two columns in dataframe df that I want to change into unordered categorical variables. Instead of doing each one separately, is there more efficient way to accomplish this? I was thinking of the following way:

patient_dummies = pd.get_dummies(df['PatientSerial'], prefix='Serial_', drop_first = True)
machine_dummies = pd.get_dummies(df['MachineID'], drop_first = True)
df = pd.concat([df, patient_dummies + machine_dummies], axis = 1)
df.drop(['PatientSerial','MachineID'], inplace = True, axis = 1)

But this didn't work; it generated 'nan' for all the entries instead of 0's and 1's.

1 answer

  • answered 2018-02-13 03:00 Brad Solomon

    Yes: pandas.get_dummies() accepts a columns argument. If you pass column names from your DataFrame, it returns both of those columns dummified, as a part of the entire DataFrame that you passed.

    df = pd.get_dummies(df, columns=['PatientSerial', 'MachineID'], drop_first=True)
    

    For example:

    np.random.seed(444)
    v = np.random.choice([0, 1, 2], size=(2, 10))
    df = pd.DataFrame({'other_col': np.empty_like(v[0]),
                       'PatientSerial': v[0],
                       'MachineID': v[1]})
    
    pd.get_dummies(df, columns=['PatientSerial', 'MachineID'],
                   drop_first=True, prefix=['Serial', 'MachineID'])
    
       other_col  Serial_1  Serial_2  MachineID_1  MachineID_2
    0          2         0         0            0            1
    1          1         0         0            0            1
    2          2         0         0            0            0
    3          2         1         0            1            0
    4          2         0         1            0            0
    5          2         1         0            0            1
    6          2         0         1            0            0
    7          2         1         0            0            1
    8          2         1         0            0            0
    9          2         1         0            0            1