SVM+HOG object detector
I ran into a problem when training an SVM+HOG object detector, and this is what I did.
I put all the features in a list called features
and used
X_scaler = StandardScaler().fit(features)
scaled_X = X_scaler.transform(features)
rand_state = np.random.randint(0, 100)
X_train, X_test, y_train, y_test = train_test_split(np.array(features), labels, test_size=0.3, random_state=rand_state)
According to sklearn.preprocessing.StandardScaler
, this transformation in StandardScaler()
is based on the mean and standard deviation of all the training samples. So here comes the question, if I want to test my trained SVM on only 1 newly seen sample, how can I apply the StandardScaler()
? as I cannot calculate the mean and standard deviation from just 1 sample.
From my understanding, if I want to test the SVM on new data (not x_test
), I need to follow the same procedure from training. Therefore, I tried to extract HOG features from multiple newly seen samples and append to another list called test_feature
, then
X_scaler = StandardScaler().fit(test_feature)
scaled_X = X_scaler.transform(test_feature_feature)
seems to be working properly and the SVM produces correct output, but when len(test_feature) == 1
, no matter I use StandardScaler()
to transform test_feature
or directly use y_pred = clf.predict(np.array(test_feature))
, the output are all garbages.
Any comments?
2 answers

Just simply call on transform. As you said, the standard scaler uses the training set's mean and standard deviation in transforming the data. The same mean and standard deviation that was already computed using the training set will be applied to the new data. There is no need to recalculate these parameters.
from sklearn.preprocessing import StandardScaler data = [[0, 0], [0, 0], [1, 1], [1, 1]] scaler = StandardScaler() # calling fit will calculate the mean and std print(scaler.fit(data)) # print out the calcualted mean for example print(scaler.mean_) # transform a new data point print(scaler.transform([[2, 2]]))

You need to fit your
StandardScaler()
on the training data only otherwise your means and variances are going to be bias as they are computed using test data. Once this transformer is fitted, you can perform atransform()
on your test data and new samples which will scale them according to the calculated mean and variances.You should:
train_test_split()
fit()
yourStandardScaler()
using train datafit()
your model using the transformed train settransform()
your test datapredict()
transformed test data
As follows:
X_train, X_test, y_train, y_test = train_test_split(np.array(features), labels, test_size=0.3, random_state=rand_state) scaler = StandardScaler() X_train = scaler.fit_transform(X_train) clf.fit(X_train, y_train) X_test = scaler.transform(X_test) clf.predict(X_train)