Using StandardScaler on specific column in Pipeline and concatenate to original data

I have a dataframe which has 4 numeric columns and I am trying to scale only one column using StandardScaler in a Pipeline. I used below code to scale and transform my column.

num_feat = ['Quantity']
num_trans = Pipeline([('scale', StandardScaler())])

preprocessor = ColumnTransformer(transformers = ['num', num_trans, num_feat])

pipe = Pipeline([('preproc', preprocessor),
                ('rf', RandomForestRegressor(random_state = 0))

After doing this I am splitting my data and training my model as below.

y = df1['target']
x = df1.drop(['target','ID'], axis = 1)
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size = 0.2), y_train)

This gives me error ValueError: not enough values to unpack (expected 3, got 1). I understand this could be because of other 3 numeric columns in my dataframe. So how do I concatenate scaled data to my remaining dataframe and train my model on whole data. Or is there any better way to do this.

1 answer

  • answered 2020-11-20 12:00 sowmyaiyer

    Please add a paranthesis when intialising the transformer.

    preprocessor = ColumnTransformer(transformers = [('num', num_trans, num_feat)],remainder='passthrough')