SVM+HOG object detector
I ran into a problem when training an SVM+HOG object detector, and this is what I did.
I put all the features in a list called
features and used
X_scaler = StandardScaler().fit(features) scaled_X = X_scaler.transform(features) rand_state = np.random.randint(0, 100) X_train, X_test, y_train, y_test = train_test_split(np.array(features), labels, test_size=0.3, random_state=rand_state)
sklearn.preprocessing.StandardScaler, this transformation in
StandardScaler() is based on the mean and standard deviation of all the training samples. So here comes the question, if I want to test my trained SVM on only 1 newly seen sample, how can I apply the
StandardScaler()? as I cannot calculate the mean and standard deviation from just 1 sample.
From my understanding, if I want to test the SVM on new data (not
x_test), I need to follow the same procedure from training. Therefore, I tried to extract HOG features from multiple newly seen samples and append to another list called
X_scaler = StandardScaler().fit(test_feature) scaled_X = X_scaler.transform(test_feature_feature)
seems to be working properly and the SVM produces correct output, but when
len(test_feature) == 1, no matter I use
StandardScaler() to transform
test_feature or directly use
y_pred = clf.predict(np.array(test_feature)), the output are all garbages.
Just simply call on transform. As you said, the standard scaler uses the training set's mean and standard deviation in transforming the data. The same mean and standard deviation that was already computed using the training set will be applied to the new data. There is no need to recalculate these parameters.
from sklearn.preprocessing import StandardScaler data = [[0, 0], [0, 0], [1, 1], [1, 1]] scaler = StandardScaler() # calling fit will calculate the mean and std print(scaler.fit(data)) # print out the calcualted mean for example print(scaler.mean_) # transform a new data point print(scaler.transform([[2, 2]]))
You need to fit your
StandardScaler()on the training data only otherwise your means and variances are going to be bias as they are computed using test data. Once this transformer is fitted, you can perform a
transform()on your test data and new samples which will scale them according to the calculated mean and variances.
StandardScaler()using train data
fit()your model using the transformed train set
transform()your test data
predict()transformed test data
X_train, X_test, y_train, y_test = train_test_split(np.array(features), labels, test_size=0.3, random_state=rand_state) scaler = StandardScaler() X_train = scaler.fit_transform(X_train) clf.fit(X_train, y_train) X_test = scaler.transform(X_test) clf.predict(X_train)