MLP performing worse than the SGD in a supervised classification
The learning dataset I'm using is a grayscale image that was flatten
to have each pixel representing an individual sample. The second image will be classified pixel by pixel after training the Stochastic gradient descent
(SGD
) and Multilayer perceptron
(MLP
) classifiers on the former one.
The problem I have is that the SGD
is performing way better than the MLP
, even if I keep the default parameters provided by Scikitlearn
in both cases. Below is the code for both (please notice that because the training dataset is in the order of millions of samples, I had to employ partial_fit()
to train the MLP
by chunks, while this was not necessary for the SGD
):
def batcherator(data, target, chunksize):
for i in range(0, len(data), chunksize):
yield data[i:i+chunksize], target[i:i+chunksize]
def classify():
if algorithm == 'sgd':
classifier = SGDClassifier(verbose=True)
classifier.fit(training.data, training.target)
elif algorithm == 'mlp':
classifier = MLPClassifier(verbose=True)
gen = batcherator(training.data, training.target, 1000)
for chunk_data, chunk_target in gen:
classifier.partial_fit(chunk_data, chunk_target,
classes=np.array([0, 1]))
My question is which parameters should I adjust in the MLP
classifier to make its results similar to those obtained with the SGD
?
I've tried to increase the number of neurons in the hidden layer using hidden_layer_sizes
but I didn't see any improvement. No improvement either if I change the activation function of the hidden layer from the default relu
to logistic
using the activation
parameter.
See also questions close to this topic

RandomForestClassifier instance not fitted yet. Call 'fit' with appropriate arguments before using this method
I am trying to train a decision tree model, save it, and then reload it when I need it later. However, I keep getting the following error:
This DecisionTreeClassifier instance is not fitted yet. Call 'fit' with appropriate arguments before using this method.
Here is my code:
X_train, X_test, y_train, y_test = train_test_split(data, label, test_size=0.20, random_state=4) names = ["Decision Tree", "Random Forest", "Neural Net"] classifiers = [ DecisionTreeClassifier(), RandomForestClassifier(), MLPClassifier() ] score = 0 for name, clf in zip(names, classifiers): if name == "Decision Tree": clf = DecisionTreeClassifier(random_state=0) grid_search = GridSearchCV(clf, param_grid=param_grid_DT) grid_search.fit(X_train, y_train_TF) if grid_search.best_score_ > score: score = grid_search.best_score_ best_clf = clf elif name == "Random Forest": clf = RandomForestClassifier(random_state=0) grid_search = GridSearchCV(clf, param_grid_RF) grid_search.fit(X_train, y_train_TF) if grid_search.best_score_ > score: score = grid_search.best_score_ best_clf = clf elif name == "Neural Net": clf = MLPClassifier() clf.fit(X_train, y_train_TF) y_pred = clf.predict(X_test) current_score = accuracy_score(y_test_TF, y_pred) if current_score > score: score = current_score best_clf = clf pkl_filename = "pickle_model.pkl" with open(pkl_filename, 'wb') as file: pickle.dump(best_clf, file) from sklearn.externals import joblib # Save to file in the current working directory joblib_file = "joblib_model.pkl" joblib.dump(best_clf, joblib_file) print("best classifier: ", best_clf, " Accuracy= ", score)
Here is how I load the model and test it:
#First method with open(pkl_filename, 'rb') as h: loaded_model = pickle.load(h) #Second method joblib_model = joblib.load(joblib_file)
As you can see, I have tried two ways of saving it but none has worked.
Here is how I tested:
print(loaded_model.predict(test)) print(joblib_model.predict(test))
You can clearly see that the models are actually fitted and if I try with any other models such as SVM, or Logistic regression the method works just fine.

Handling outlier for ecommerce data
I have read about deleting the data points that is outlier, I also know about winsorizing the data also to change the upper and lower inner fence.
Is there any better solution for handling the outlier in ecommerce dataset? Thank you :)

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all() when converting a map object to list
I've title_result which is a list of 1000 movie titles.
i've a function whivh return similar movies based on a model. The function looks like this:def get_recommendations(title): idx = indices[title] sim_scores = list(enumerate(cosine_sim[idx])) sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True) sim_scores = sim_scores[1:31] movie_indices = [i[0] for i in sim_scores] return title, str(titles.iloc[movie_indices].reset_index(drop=True).tolist())
The output of function will be as folllows when 'Twilight Saga: Breaking Dawn  1' is passed as input parameter.
('Twilight Saga: Breaking Dawn  1', "['Twilight Saga: Breaking Dawn  2', 'Kedi Billa Killadi Ranga', 'Anjali', 'Half Girlfriend']")I've used a map function to get result for rest of the titles.
result = list(map(get_recommendations,title_result)) result_df = pd.DataFrame(result,columns=['title','similar_movies'])
When I pass list range title_result[:20], I will get the output perfectly, but when when I pass whole list or list range from [25:30], I'm getting
`ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()`.
I think there is some exception with certain values when converting map object to list, which I'm unable to solve.

all coefficients turn zero in Logistic regression using scikit learn
I am working on logistic regression using scikit learn in python. I have the data file that can be downloaded via the following link.
Below is my code for machine learning part.
from sklearn.linear_model import Lasso from sklearn.model_selection import train_test_split from sklearn.preprocessing import StandardScaler from sklearn.metrics import roc_auc_score import pandas as pd scaler = StandardScaler() data = pd.read_csv('data.csv') dataX = data.drop('outcome',axis =1).values.astype(float) X = scaler.fit_transform(dataX) dataY = data[['outcome']] Y = dataY.values X_train,X_test,y_train,y_test = train_test_split (X,Y,test_size = 0.25, random_state = 33) lasso = Lasso(alpha=.3) lasso.fit(X_train,y_train) print("MC learning completed") print(lasso.score(X_train,y_train)) print(lasso.score(X_test,y_test)) print(lasso.coef_)
when I print coefficients, it turns out all zero. Can anyone advise me on that?
Let me explain a little bit about my objective. The problem seems to be a classification problem as we can only see 0 or 1 in Ytrain and Ytest. if we put a simple example, 0 can be considered as missed, 1 can be considered as scored. what I am trying to do is to compute the probability scoring for each event when a shot is taken place.
Thanks in advance.
Regards,
Zep

Prune unnecessary leaves in sklearn DecisionTreeClassifier
I use sklearn.tree.DecisionTreeClassifier to build a decision tree. With the optimal parameter settings, I get a tree that has unnecessary leaves (see example picture below  I do not need probabilities, so the leaf nodes marked with red are a unnecessary split)
Is there any thirdparty library for pruning these unnecessary nodes? Or a code snippet? I could write one, but I can't really imagine that I am the first person with this problem...
Code to replicate:
from sklearn.tree import DecisionTreeClassifier from sklearn import datasets iris = datasets.load_iris() X = iris.data y = iris.target mdl = DecisionTreeClassifier(max_leaf_nodes=8) mdl.fit(X,y)
PS: I have tried multiple keyword searches and am kind of surprised to find nothing  is there really no postpruning in general in sklearn?
PPS: In response to the possible duplicate: While the suggested question might help me when coding the pruning algorithm myself, it answers a different question  I want to get rid of leaves that do not change the final decision, while the other question wants a minimum threshold for splitting nodes.
PPPS: The tree shown is an example to show my problem. I am aware of the fact that the parameter settings to create the tree are suboptimal. I am not asking about optimizing this specific tree, I need to do postpruning to get rid of leaves that might be helpful if one needs class probabilities, but are not helpful if one is only interested in the most likely class.

What is the best neural network configuration to solve the XOR dataset on Tensorflow Playground?
Hi All,
I have toyed with 2 network configurations on Tensorflow playground
 1 hidden layer with 2 nodes
 2 hidden layers with 2 nodes each
I find that the second option (2 hidden layers) gives better classification accuracy. However, I find that the network does not converge to a solution all the time.
Thanks, Sau

How do epochs work in this tensorflow example
The following is a snippet of code used in training a simple neural network.
for epoch in range(hm_epochs): epoch_loss = 0 for _ in range(int(mnist.train.num_examples/batch_size)): epoch_x, epoch_y = mnist.train.next_batch(batch_size) _, c = sess.run([optimizer, cost], feed_dict={x: epoch_x, y: epoch_y}) epoch_loss += c print('Epoch', epoch, 'completed out of',hm_epochs,'loss:',epoch_loss)
This isn't the full code, but from what I see, the inner loop trains using all the test data (split into batches) and optimizes using an optimizing algorithm. With the inner loop running one the accuracy of the algorithm is 90% but when it runs 10 times (hm_epochs=10) it's correct 95% of the time. That doesn't make any sense to me, how does training it with the same data multiple times (which is what happens when the outer loop runs), make it any more accurate.
I am new to tensorflow.
This is not my code, it comes from this series: https://pythonprogramming.net/tensorflowneuralnetworksessionmachinelearningtutorial/

Why Does My Neural Network Always Predicts The Same Output? (synaptic.js)
Hello this question is related to Synaptic.js.
I'm trying to use Neural Network to read handwritten digits from images. Image size is 120x90 and it only contains black and white pixels. I'm using html5 canvas to transform images to Matrix before feeding it into a NN with 10800 inputs, 30 hidden layers and 3 outputs.
var myNetwork = new synaptic.Architect.Perceptron(10800,30,3); var learningRate = .3;
The code to transform Image to Matrix :
var matrix = []; for(i = 0; i < img.height; i++) { for(j = 0; j < img.width; j++) { var rgba = ctx.getImageData(j, i, 1, 1); var avg = ((rgba["data"][0] + rgba["data"][1] + rgba["data"][2]) / 3); var indice; if(avg < 150) indice = 1; else indice = 0; matrix.push(indice); } }
The code above gives a result I expected, basically it transforms 'white' pixels into 0 and 'black' pixels into 1. Attached is a matrix example for an image containing '1' (I resized the actual Matrix because it's too big) : matrix illustration
When I feed the NN two images, each containing '1' or '2' handwritten digits it categorize them just fine. Same thing when I try to feed it five more images containing '1' or '2' handwritten digits, the NN is still able to categorize them well.
The code I use to train NN :
myNetwork.propagate(learningRate, [1, 0]); //Train NN to learn '1' myNetwork.propagate(learningRate, [0, 1]); //Train NN to learn '2'
However, when I start to feed '3' into the network, it starts to categorize every image inputs (including '1' and '2') into the third category. Basically when the output size is more than 2, the NN will always predict the same output which is the last category I train it to learn.
myNetwork.propagate(learningRate, [0, 0, 1]); //Train NN to learn '3'
I have more than 75 images per number and I have tried to feed them all but the NN always categorize them into '3'. What do I do wrong here? any help is appreciated, Thanks!

Split data set and split ratio to get constant accuracy in Python
def splitDataset(dataset, splitRatio): trainSize = int(len(dataset) * splitRatio) trainSet = [] copy = list(dataset) while len(trainSet) < trainSize: index = random.randrange(len(copy)) trainSet.append(copy.pop(index)) return [trainSet, copy]
Using this function I get different accuracy every time I run the classifier. It was caused by random function. So how can I split my data to get the unchanging accuracy? this line is the main problem :
index = random.randrange(len(copy))

Calculate Local Binary Pattern with radius
I'm confused with a local binary pattern that uses radius. I have read the journal by Ojala et al, where they use this equation to calculate the center pixel :Eq.1
Where :Eq.2
with :
R = Radius
P = amount of neighborhood pixel
gp = neighborhood pixel
gc = center pixelfor example R=2;P=16(from 015)
if s(g15gc)=1 and the other is 0
so the LBP value is:
LBP=1*2^15=32768so, how this big value turn into pixel value?

sklearn adapt pipeline for multiple features
This code works when x is a vector (ie single feature). How do I adapt this when x is an array (2 or more features). I want tfidf to vectorize each column and use both features in the classification model.
The code runs as written but seems to ignore the extra features
def grid_search(train_x, train_y): from sklearn.model_selection import GridSearchCV from sklearn.pipeline import Pipeline from sklearn.feature_extraction.text import TfidfVectorizer from sklearn.multiclass import OneVsRestClassifier from sklearn.naive_bayes import MultinomialNB parms={ 'tfidf__max_df': (0.25, 0.5, 0.75), 'tfidf__ngram_range': [(1, 1), (1, 2), (1, 3)], 'clf__estimator__alpha': (1e2, 1e3) } pipeline = Pipeline([ ('tfidf', TfidfVectorizer(stop_words=stop_words)), ('clf', OneVsRestClassifier(MultinomialNB( fit_prior=True, class_prior=None))) ]) gs1 = GridSearchCV(pipeline, parms, cv=2, n_jobs=1, verbose=0) gs1.fit(train_x, train_y) return gs1.best_estimator_ classifier = grid_search(train_x, y_train)

What wrong is in my code that the error keeps on increasing with every iteration of gradient descent?
The code below reads a csv (Andrew NG ML course ex1 multivariate linear regression exercise data file) and then attempts to fit a linear model to the dataset using the learning rate, alpha = 0.01. Gradient descent is to make decrements to the parameters (theta vector) 400 times (alpha and num_of_iterations values were given in the problem statement). I tried a vectorised implementation to obtain the optimum values of parameters but the descent is not converging the error keeps on increasing.
# Imports ```python import numpy as np import pandas as pd import matplotlib.pyplot as plt ``` # Model Preparation ## Gradient descent ```python def gradient_descent(m, theta, alpha, num_of_iterations, X, Y): # print(m, theta, alpha, num_of_iterations) for i in range(num_of_iterations): htheta_vector = np.dot(X,theta) # print(X.shape, theta.shape, htheta_vector.shape) error_vector = htheta_vector  Y gradient_vector = (1/m) * (np.dot(X.T, error_vector)) # each element in gradient_vector corresponds to each theta theta = theta  alpha * gradient_vector return theta ``` # Main ```python def main(): df = pd.read_csv('data2.csv', header = None) #loading data data = df.values # converting dataframe to numpy array X = data[:, 0:2] # print(X.shape) Y = data[:, 1] m = (X.shape)[0] # number of training examples Y = Y.reshape(m, 1) ones = np.ones(shape = (m,1)) X_with_bias = np.concatenate([ones, X], axis = 1) theta = np.zeros(shape = (3,1)) # two features, so three parameters alpha = 0.001 num_of_iterations = 400 theta = gradient_descent(m, theta, alpha, num_of_iterations, X_with_bias, Y) # calling gradient descent # print('Parameters learned: ' + str(theta)) if __name__ == '__main__': main() ```
The error:
/home/krishthorcode/anaconda3/lib/python3.6/sitepackages/ipykernel_launcher.py:8: RuntimeWarning: invalid value encountered in subtract
Error values for different iterations:
Iteration 1 [[399900.] [329900.] [369000.] [232000.] [539900.] [299900.] [314900.] [198999.] [212000.] [242500.] [239999.] [347000.] [329999.] [699900.] [259900.] [449900.] [299900.] [199900.] [499998.] [599000.] [252900.] [255000.] [242900.] [259900.] [573900.] [249900.] [464500.] [469000.] [475000.] [299900.] [349900.] [169900.] [314900.] [579900.] [285900.] [249900.] [229900.] [345000.] [549000.] [287000.] [368500.] [329900.] [314000.] [299000.] [179900.] [299900.] [239500.]]
Iteration 2 [[1.60749981e+09] [1.22240841e+09] [1.83373661e+09] [1.08189071e+09] [2.29209231e+09] [1.51666004e+09] [1.17198560e+09] [1.09033113e+09] [1.05440030e+09] [1.14148964e+09] [1.48233053e+09] [1.52807496e+09] [1.44402895e+09] [3.42143452e+09] [9.68760976e+08] [1.75723592e+09] [1.00845873e+09] [9.44366284e+08] [1.99332644e+09] [2.31572369e+09] [1.35010833e+09] [1.44257442e+09] [1.22555224e+09] [1.49912323e+09] [2.97220331e+09] [8.40383843e+08] [1.11375611e+09] [1.92992696e+09] [1.68078878e+09] [2.01492327e+09] [1.40503327e+09] [7.64040689e+08] [1.55867654e+09] [2.39674784e+09] [1.38370165e+09] [1.09792232e+09] [9.46628911e+08] [1.62895368e+09] [3.22059730e+09] [1.65193796e+09] [1.27127807e+09] [1.70997383e+09] [1.96141565e+09] [9.16755655e+08] [6.50928858e+08] [1.41502023e+09] [9.19107783e+08]]
Iteration 3 [[7.42664624e+12] [5.64764378e+12] [8.47145714e+12] [4.99816153e+12] [1.05893224e+13] [7.00660901e+12] [5.41467917e+12] [5.03699402e+12] [4.87109500e+12] [5.27348843e+12] [6.84776945e+12] [7.05955046e+12] [6.67127611e+12] [1.58063228e+13] [4.47576119e+12] [8.11848565e+12] [4.65930400e+12] [4.36280860e+12] [9.20918360e+12] [1.06987452e+13] [6.23711474e+12] [6.66421140e+12] [5.66176276e+12] [6.92542434e+12] [1.37308096e+13] [3.88276038e+12] [5.14641706e+12] [8.91620784e+12] [7.76550392e+12] [9.30801176e+12] [6.49125293e+12] [3.52977344e+12] [7.20074619e+12] [1.10728954e+13] [6.39242960e+12] [5.07229174e+12] [4.37339793e+12] [7.52548475e+12] [1.48779889e+13] [7.63137769e+12] [5.87354379e+12] [7.89963490e+12] [9.06093321e+12] [4.23573710e+12] [3.00737309e+12] [6.53715005e+12] [4.24632634e+12]]
Iteration 4 [[3.43099835e+16] [2.60912608e+16] [3.91368523e+16] [2.30907512e+16] [4.89210695e+16] [3.23694753e+16] [2.50149995e+16] [2.32701516e+16] [2.25037231e+16] [2.43627199e+16] [3.16356608e+16] [3.26140566e+16] [3.08202877e+16] [7.30228235e+16] [2.06773403e+16] [3.75061770e+16] [2.15252802e+16] [2.01555166e+16] [4.25450367e+16] [4.94265862e+16] [2.88145280e+16] [3.07876502e+16] [2.61564888e+16] [3.19944145e+16] [6.34342666e+16] [1.79377661e+16] [2.37756683e+16] [4.11915330e+16] [3.58754545e+16] [4.30016088e+16] [2.99886077e+16] [1.63070200e+16] [3.32663597e+16] [5.11551035e+16] [2.95320591e+16] [2.34332215e+16] [2.02044376e+16] [3.47666027e+16] [6.87340617e+16] [3.52558124e+16] [2.71348846e+16] [3.64951201e+16] [4.18601431e+16] [1.95684650e+16] [1.38936092e+16] [3.02006457e+16] [1.96173860e+16]]
Iteration 5 [[1.58506940e+20] [1.20537683e+20] [1.80806345e+20] [1.06675782e+20] [2.26007951e+20] [1.49542086e+20] [1.15565519e+20] [1.07504585e+20] [1.03963801e+20] [1.12552086e+20] [1.46151974e+20] [1.50672014e+20] [1.42385073e+20] [3.37354413e+20] [9.55261885e+19] [1.73272871e+20] [9.94435428e+19] [9.31154420e+19] [1.96551642e+20] [2.28343362e+20] [1.33118767e+20] [1.42234293e+20] [1.20839027e+20] [1.47809362e+20] [2.93056729e+20] [8.28697695e+19] [1.09839996e+20] [1.90298660e+20] [1.65739180e+20] [1.98660937e+20] [1.38542837e+20] [7.53359691e+19] [1.53685556e+20] [2.36328850e+20] [1.36433652e+20] [1.08257943e+20] [9.33414495e+19] [1.60616452e+20] [3.17540981e+20] [1.62876527e+20] [1.25359067e+20] [1.68601941e+20] [1.93387537e+20] [9.04033523e+19] [6.41863754e+19] [1.39522421e+20] [9.06293597e+19]]
Iteration 83 [[1.09904300e+306] [8.35774743e+305] [1.25366087e+306] [7.39660179e+305] [1.56707622e+306] [1.03688320e+306] [8.01299137e+305] [7.45406868e+305] [7.20856058e+305] [7.80404831e+305] [1.01337710e+306] [1.04471781e+306] [9.87258464e+305] [2.33912159e+306] [6.62352000e+305] [1.20142586e+306] [6.89513844e+305] [6.45636555e+305] [1.36283437e+306] [1.58326931e+306] [9.23008472e+305] [9.86212994e+305] [8.37864174e+305] [1.02486897e+306] [2.03197378e+306] [5.74595914e+305] [7.61599955e+305] [1.31947793e+306] [1.14918934e+306] [1.37745963e+306] [9.60617469e+305] [5.22358639e+305] [1.06561287e+306] [1.63863846e+306] [9.45992963e+305] [7.50630445e+305] [6.47203628e+305] [1.11366977e+306] [2.20174077e+306] [1.12934050e+306] [8.69204879e+305] [1.16903893e+306] [1.34089535e+306] [6.26831680e+305] [4.45050460e+305] [9.67409627e+305] [6.28398753e+305]]
Iteration84 [[inf] [inf] [inf] [inf] [inf] [inf] [inf] [inf] [inf] [inf] [inf] [inf] [inf] [inf] [inf] [inf] [inf] [inf]
[inf] [inf] [inf] [inf] [inf] [inf] [inf] [inf] [inf] [inf] [inf] [inf] [inf] [inf] [inf] [inf] [inf] [inf] [inf] [inf] [inf] [inf] [inf] [inf] [inf] [inf] [inf] [inf] [inf]] 
Implementing Adam in Pytorch
I’m trying to implement Adam by myself for a learning purpose.
Here is my Adam implementation:
class ADAMOptimizer(Optimizer): """ implements ADAM Algorithm, as a preceding step. """ def __init__(self, params, lr=1e3, betas=(0.9, 0.99), eps=1e8, weight_decay=0): defaults = dict(lr=lr, betas=betas, eps=eps, weight_decay=weight_decay) super(ADAMOptimizer, self).__init__(params, defaults) def step(self): """ Performs a single optimization step. """ loss = None for group in self.param_groups: #print(group.keys()) #print (self.param_groups[0]['params'][0].size()), First param (W) size: torch.Size([10, 784]) #print (self.param_groups[0]['params'][1].size()), Second param(b) size: torch.Size([10]) for p in group['params']: grad = p.grad.data state = self.state[p] # State initialization if len(state) == 0: state['step'] = 0 # Momentum (Exponential MA of gradients) state['exp_avg'] = torch.zeros_like(p.data) #print(p.data.size()) # RMS Prop componenet. (Exponential MA of squared gradients). Denominator. state['exp_avg_sq'] = torch.zeros_like(p.data) exp_avg, exp_avg_sq = state['exp_avg'], state['exp_avg_sq'] b1, b2 = group['betas'] state['step'] += 1 # L2 penalty. Gotta add to Gradient as well. if group['weight_decay'] != 0: grad = grad.add(group['weight_decay'], p.data) # Momentum exp_avg = torch.mul(exp_avg, b1) + (1  b1)*grad # RMS exp_avg_sq = torch.mul(exp_avg_sq, b2) + (1b2)*(grad*grad) denom = exp_avg_sq.sqrt() + group['eps'] bias_correction1 = 1 / (1  b1 ** state['step']) bias_correction2 = 1 / (1  b2 ** state['step']) adapted_learning_rate = group['lr'] * bias_correction1 / math.sqrt(bias_correction2) p.data = p.data  adapted_learning_rate * exp_avg / denom if state['step'] % 10000 ==0: print ("group:", group) print("p: ",p) print("p.data: ", p.data) # W = p.data return loss
I think I implemented everything correct however the loss graph of my implementation is very spiky compared to that of torch.optim.Adam.
My ADAM implementation loss graph (below)
torch.optim.Adam loss graph (below) If someone could tell me what I am doing wrong, I’ll be very grateful.
For the full code including data, graph (super easy to run): https://github.com/byorxyz/AMS_pytorch/blob/master/AdamFails_1dConvex.ipynb

How to calculate partial derivatives of error function with respect to values in matrix
I am building a project that is a basic neural network that takes in a 2x2 image with the goal to classify the image as either a forward slash (1class) or back slash (0class) shape. The data for the input is a flat numpy array. 1 represents a black pixel 0 represents a white pixel.
0class: [1, 0, 0, 1]
1class: [0, 1, 1, 0]
If I start my filter as a random 4x1 matrix, how can I use gradient descent to come to either perfect matrix [1,1,1,1] or [1,1,1,1] to classify the datapoints.
Side note: Even when multiplied with the "perfect" answer matrix then summed, the label output would be 2 and 2. Would my data labels need to be 2 and 2? What if I want my classes labeled as 0 and 1?