Support Vector Machine bias after a dual problem solution Is way too big
I am trying to implement a Gaussian SVM.
I have set up the required matrices for the dual problem, and solved it using a quadratic solver:
y is the target vector here.
ker_mat is an m by m matrix where ker_mat[i,j] is the kernel function of data samples i and j.
I have also added an intercept column to the data samples and standard scaled the remaining columns.
You can skip the code, as it is most likely not the cause of the problem, but I embedded it here just in case.
ytrgt = -np.ones((y.shape, 1)) ytrgt[y == c] = 1 trgt_map = ytrgt @ ytrgt.T P = np.multiply(trgt_map, ker_mat) q = -np.ones((y.shape, 1)) h = np.hstack([np.zeros_like(y).reshape(-1,), np.full((y.shape,), self.C)]).reshape(-1, 1) G = np.zeros((2*y.shape, y.shape)) np.fill_diagonal(G, -1) np.fill_diagonal(G[y.shape:,:], 1) A = ytrgt.T cvxopt.solvers.options['show_progress'] = False a_hat = cvxopt.solvers.qp( q = cvxopt.matrix(q), #to substract all a values P = cvxopt.matrix(P), h = cvxopt.matrix(h), G = cvxopt.matrix(G), b = cvxopt.matrix(np.zeros((1,1))), A = cvxopt.matrix(A) )
This solver seems to work. If i just use the result of this solver, I am getting a satisfactory accuracy; but as we know SVM's also have an intercept term, and If I add it, things aren't so good anymore. This is the code I use to calculate the bias term for a given SVM:
#calculate bias for the class n_s = 0 s = 0 for i, (a_i, t_i) in enumerate(zip(np.matrix(a_hat["x"]), ytrgt)): if a_i <= self.support_tolerance: continue n_s += 1 for j, (a_j, t_j) in enumerate(zip(np.matrix(a_hat["x"]), ytrgt)): if a_j <= self.support_tolerance: continue s += t_i -a_j*t_j*ker_mat[i,j] bias = s / n_s
I had to include the support_tolerance variable, any qp solver result smaller than this is considered to be zero and not a support vector.
Now, to the problem:
The bias is usually way bigger than the prediction result without it. For example, prediction result for something clearly in the target class, using only the sum of
a[i]*t[i]*kernel(Xi, Xsample) for all support vectors is around 1.3, while the bias for the whole class is around -40, making it look like each sample belongs to the the negative class (even if I run the classification on the dataset on which the SVM was trained, I get that every sample is negative).
Because of this, two things concern me:
Should I even add a bias if I have added an intercept term (column of ones) to the data itself, and the Kernel is Gaussian?
I know that the derivation results in the bias being divided by the number of support vectors, but wouldn't it make more sense to divide by the square of it? I mean the sum is proportional to the square of the number of the support vectors, yet we only divide by a linear factor.
Even If bias is not required in this case, why does it explode like this? If it was not needed I would say it will be zero, so it doesn't affect the result in any case, yet clearly here it does affect the result.