SVM+HOG object detector

I ran into a problem when training an SVM+HOG object detector, and this is what I did. I put all the features in a list called features and used

X_scaler = StandardScaler().fit(features)
scaled_X = X_scaler.transform(features)
rand_state = np.random.randint(0, 100)
X_train, X_test, y_train, y_test = train_test_split(np.array(features), labels, test_size=0.3, random_state=rand_state)

According to sklearn.preprocessing.StandardScaler, this transformation in StandardScaler() is based on the mean and standard deviation of all the training samples. So here comes the question, if I want to test my trained SVM on only 1 newly seen sample, how can I apply the StandardScaler()? as I cannot calculate the mean and standard deviation from just 1 sample.

From my understanding, if I want to test the SVM on new data (not x_test), I need to follow the same procedure from training. Therefore, I tried to extract HOG features from multiple newly seen samples and append to another list called test_feature, then

X_scaler = StandardScaler().fit(test_feature)
scaled_X = X_scaler.transform(test_feature_feature)

seems to be working properly and the SVM produces correct output, but when len(test_feature) == 1, no matter I use StandardScaler() to transform test_feature or directly use y_pred = clf.predict(np.array(test_feature)), the output are all garbages.

Any comments?

2 answers

  • answered 2021-06-23 06:32 jodumagpi

    Just simply call on transform. As you said, the standard scaler uses the training set's mean and standard deviation in transforming the data. The same mean and standard deviation that was already computed using the training set will be applied to the new data. There is no need to recalculate these parameters.

    from sklearn.preprocessing import StandardScaler
    data = [[0, 0], [0, 0], [1, 1], [1, 1]]
    scaler = StandardScaler()
    # calling fit will calculate the mean and std
    print(scaler.fit(data))
    # print out the calcualted mean for example
    print(scaler.mean_)
    # transform a new data point
    print(scaler.transform([[2, 2]]))
    

  • answered 2021-06-23 06:33 Antoine Dubuis

    You need to fit your StandardScaler() on the training data only otherwise your means and variances are going to be bias as they are computed using test data. Once this transformer is fitted, you can perform a transform() on your test data and new samples which will scale them according to the calculated mean and variances.

    You should:

    1. train_test_split()
    2. fit() your StandardScaler() using train data
    3. fit() your model using the transformed train set
    4. transform() your test data
    5. predict() transformed test data

    As follows:

    X_train, X_test, y_train, y_test = train_test_split(np.array(features), labels, test_size=0.3, random_state=rand_state)
    
    scaler = StandardScaler()
        
    X_train = scaler.fit_transform(X_train)
    
    clf.fit(X_train, y_train)
    
    X_test = scaler.transform(X_test)
    
    clf.predict(X_train)