MLP performing worse than the SGD in a supervised classification
The learning dataset I'm using is a grayscale image that was flatten
to have each pixel representing an individual sample. The second image will be classified pixel by pixel after training the Stochastic gradient descent
(SGD
) and Multilayer perceptron
(MLP
) classifiers on the former one.
The problem I have is that the SGD
is performing way better than the MLP
, even if I keep the default parameters provided by Scikitlearn
in both cases. Below is the code for both (please notice that because the training dataset is in the order of millions of samples, I had to employ partial_fit()
to train the MLP
by chunks, while this was not necessary for the SGD
):
def batcherator(data, target, chunksize):
for i in range(0, len(data), chunksize):
yield data[i:i+chunksize], target[i:i+chunksize]
def classify():
if algorithm == 'sgd':
classifier = SGDClassifier(verbose=True)
classifier.fit(training.data, training.target)
elif algorithm == 'mlp':
classifier = MLPClassifier(verbose=True)
gen = batcherator(training.data, training.target, 1000)
for chunk_data, chunk_target in gen:
classifier.partial_fit(chunk_data, chunk_target,
classes=np.array([0, 1]))
My question is which parameters should I adjust in the MLP
classifier to make its results similar to those obtained with the SGD
?
I've tried to increase the number of neurons in the hidden layer using hidden_layer_sizes
but I didn't see any improvement. No improvement either if I change the activation function of the hidden layer from the default relu
to logistic
using the activation
parameter.
See also questions close to this topic

Image classification features
I'm new to ML and trying to solve a imagine classification problem . I have a dataset of about 2mil photos from Flickr with 10 labels : music , food , wedding , nature , etc . Unfortunately i don't have the computational power to use all photos , so I decided to use 30k photos for training and 10k photos for testing . I'm willing to reduce the number of classes (my classes have a broad spectrum) keeping the same number of photos if it would improve my classification rate . Witch algoritm would be best for feature extraction? Should i opt for SVM or CNN ? Thank you for your answer.

how to plot BoxWhisker in matplotlib.pyplot
How to plot 2 columns in boxwhisker plot using matplotlib.pyplot?
When I use below syntax to plot 2 columns in dataframe with variable values, I see output which is only 2 values in x axis
plt.boxplot([df['zipcode'],df['price']])
and if I change the syntax to
plt.boxplot(df['zipcode'],df['price'])
, it results in value error
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

How can I create ordinal labels in scikitlearn or Pyspark? And how can I train a model on data that has ordinal labels?
I'm working on a problem where I'm trying to bin my target variable into classes. For e.g. $05000, $500110000, $1000115000 etc.
As apparent, the labels have a certain order, that is $05000 class has the foremost rank, followed by $500110000 and $1000115000.
a) Can anyone please tell me how I can induce or maintain order in my target variable labeling in scikit or Pyspark?
b) Upon having the ordinal labels, how should I be doing the model training? Are there any specific things that I should be aware of during modelling or I can just proceed forward like one would in case of nonordinal classification problem.

The f1_score calculated from sklearn.metrics.f1_score is incorrect?
I think the f1_score calculated from sklearn.metrics.f1_score is incorrect, as discussed here.
Does anyone agree?

ScikitLearn: TimeSeriesSplit Yields Different Number Than n_splits
I'm using
TimeSeriesSplit
to perform crossvalidation for a timeseries regression problem. I'm doing this:tscv = TimeSeriesSplit(n_splits=77) sum(1 for i in tscv.split(x)) //returns 77
However, when I do this:
rmses = [] iteration = 1 for train_index, test_index in tscv.split(x): print("ITERATION:", iteration) iteration += 1 x_train,x_test,y_train,y_test = x[train_index],x[test_index],y[train_index],y[test_index] model.fit(x_train,y_train,validation_data=(x_test,y_test),epochs=100) rmses.append(model.evaluate(x_test,y_test))
My code results in
rmses
with 117 elements. I'm not sure how this is possible given that I only go through 77 splits.If it's helpful, my
rmses
contains 84 validnp.float64
s and then the rest arenan
.Any idea where I'm going wrong?
Thanks!

Structuring & Testing a Neural Network for Prediction
I am a neural network enthusiast. Still learning, please forgive my ignorance.
Currently, I am developing a project where I am trying to use forecast NHL hockey game outcomes.
So far, I have 66 input variables describing various team statistics, one hidden layer, and 4 output variables (predicted team goals (09), outright winner (expressed binary as a 1 or 2), expected goal differential (9 to +9) and total game goals (015). I have approximately 1200 rows of game data that I am using for training.
This is all within MS Excel, which is the platform I've built my NN inside.
My questions are:
 How many neurons in the hidden layer should there be?
 How can I test and optimize the number of neurons for best predictive ability within Excel?
 How does one decide the best activation function to use? (Currently using a sigmoid function)
 What would the best way to structure the outputs be? (1 for a win, 0 for a loss? Perhaps 1 for a win, 1 for a loss?)
I am having trouble figuring out how to optimize the NN for best results. If someone could provide assistance with regards to how to test for optimization in Excel (correlation between predicted and actual outputs? Regression between? Standard deviation? something entirely different?)
Any help or feedback would be greatly appreciated!

How does batching work in a seq2seq model in pytorch?
I am trying to implement a seq2seq model in Pytorch and I am having some problem with the batching. For example I have a batch of data whose dimensions are
[batch_size, sequence_lengths, encoding_dimension]
where the sequence lengths are different for each example in the batch.
Now, I managed to do the encoding part by padding each element in the batch to the length of the longest sequence.
This way if I give as input to my net a batch with the same shape as said, I get the following outputs:
output, of shape
[batch_size, sequence_lengths, hidden_layer_dimension]
hidden state, of shape
[batch_size, hidden_layer_dimension]
cell state, of shape
[batch_size, hidden_layer_dimension]
Now, from the output, I take for each sequence the last relevant element, that is the element along the
sequence_lengths
dimension corresponding to the last non padded element of the sequence. Thus the final output I get is of shape[batch_size, hidden_layer_dimension]
.But now I have the problem of decoding it from this vector. How do I handle a decoding of sequences of different lengths in the same batch? I tried to google it and found this, but they don't seem to address the problem. I thought of doing element by element for the whole batch, but then I have the problem to pass the initial hidden states, given that the ones from the encoder will be of shape
[batch_size, hidden_layer_dimension]
, while the ones from the decoder will be of shape[1, hidden_layer_dimension]
.Am I missing something? Thanks for the help!

Im having trouble with my TensorFlow code, its accuracy and cost isn't improving
class neural_net_for_noobs: ''' A neural network class that is created with the number of units in each layer [inputLayer <hidden layers> outputLayer] ''' weights = [] biases = [] seed = 5 dev = 0.1 train_x = [] val_x = [] train_y = [] val_y = [] train = [] label = [] rng = np.random.RandomState(seed) def init_encoder(self): ''' Function to initialise the decoder basded on the class labels available in the whole data set ''' self.onehot_encoder = OneHotEncoder(sparse=False) self.onehot_encoded = self.onehot_encoder.fit(self.label) def __init__(self , num_units): ''' the constructor that takes in the number of units in each layer as described above. For example [10 5 2] would mean a neural network with 10 input features , a hidden layer with 5 units and 2 units in the output layer ''' self.num_units = num_units self.num_input = num_units[0] self.num_output = num_units[1] self.num_hidden = num_units[1:1] self.train_data = tf.placeholder(tf.float32 , [None , self.num_input]) self.train_label = tf.placeholder(tf.float32 , [None , self.num_output]) self.initialize_weights_biases(num_units) self.layers() self.cost_function() self.optimizer_function() self.init_function() def dense_to_one_hot(self , labels_dense, num_classes=10): """Convert class labels from scalars to onehot vectors""" temp = self.onehot_encoded.transform(labels_dense.reshape(1,1)) return temp def batch_creator(self , batch_size, dataset_length, dataset_name): """Create batch with random samples and return appropriate format""" batch_mask = self.rng.choice(dataset_length, batch_size) batch_x = eval('self.' + dataset_name + '_x')[[batch_mask]].reshape(1, self.num_input) if dataset_name == 'train': batch_y = self.train_y[batch_mask] batch_y = self.dense_to_one_hot(batch_y , num_classes = max(self.train_y)) return batch_x, batch_y def initialize_weights_biases(self , num_units): ''' Initializes weights and biases to random values ''' for i in range(len(self.num_units)  1): self.weights.append(tf.Variable(tf.random_normal([self.num_units[i] , self.num_units[i+1]] , seed = self.seed , stddev = self.dev))) for i in range(len(self.num_units)  1): self.biases.append(tf.Variable(tf.random_normal([self.num_units[i+1]] , seed = self.seed , stddev = self.dev))) def layers(self): ''' Create the layers and hold them in the list ''' self.layer = [] print(len(self.weights) , len(self.biases)) for i in range(len(self.num_units)  1): if i == 0: self.layer.append(tf.add(tf.matmul(self.train_data , self.weights[i]) , self.biases[i])) self.layer[1] = tf.nn.relu(self.layer[1]) elif i == len(self.num_units)  2: self.layer.append(tf.add(tf.matmul(self.layer[i1] , self.weights[i]) , self.biases[i])) self.layer[1] = tf.nn.softmax(self.layer[1]) break else : self.layer.append(tf.add(tf.matmul(self.layer[i1] , self.weights[i]) , self.biases[i])) self.layer[1] = tf.nn.relu(self.layer[1]) def cost_function(self): ''' defining the cost function ''' self.cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels = self.train_label , logits = self.layer[1])) def optimizer_function(self , learning_rate = 0.01): ''' defining the optimizer ''' self.optimizer = tf.train.AdamOptimizer(learning_rate = learning_rate) self.optimizer = self.optimizer.minimize(self.cost) def init_function(self): ''' defining the init function to be called inside session ''' self.init = tf.global_variables_initializer() def load_train(self , train_x , train_y): ''' loads train data onto the variables ''' self.train_x = train_x self.train_y = train_y def load_val(self , val_x , val_y): ''' loads validation data onto the variables ''' self.val_x = val_x self.val_y = val_y def load_data(self , train , label): ''' loads the complete data. I.e. train + test. This part is kinda inefficient and will be taken out ''' self.train = train self.label = label.reshape(1,1) self.init_encoder() def start_session(self , epochs = 10 , batch_size = 512): ''' training the model happens here. ''' with tf.Session() as sess: sess.run(self.init) pred_temp = tf.equal(tf.argmax(self.layer[1] , 1) , tf.argmax(self.train_label , 1)) total_batch = int(self.train_x.shape[0] / batch_size) for epoch in range(epochs): avg_cost = 0 for i in range(total_batch): batch_x , batch_y = self.batch_creator(batch_size , self.train_x.shape[0] , 'train') _ , c = sess.run([self.optimizer , self.cost] , feed_dict = {self.train_data: batch_x , self.train_label: batch_y}) avg_cost += ((float)(c)/(float)(total_batch)) print ("Epoch " , (epoch + 1) , " cost = " , "{:.5f}".format(avg_cost)) #predictions on validation set accuracy = tf.reduce_mean(tf.cast(pred_temp , "float")) print ("Validation accuracy : " , accuracy.eval({self.train_data: self.val_x.reshape(1,self.num_input) , self.train_label: self.dense_to_one_hot(self.val_y)})) print("Weights : ") print(sess.run(self.weights[1])) print("Bias : ") print(sess.run(self.biases[1])) print("\nTraining Complete !") accuracy = tf.reduce_mean(tf.cast(pred_temp , "float")) print ("Validation accuracy : " , accuracy.eval({self.train_data: self.val_x.reshape(1,self.num_input) , self.train_label: self.dense_to_one_hot(self.val_y)})) print ("Train accuracy : " , accuracy.eval({self.train_data: self.train_x.reshape(1,self.num_input) , self.train_label: self.dense_to_one_hot(self.train_y)})) def predict_for(self , features): ''' predicting for the given set of features happens here ''' pred = None with tf.Session() as sess: sess.run(self.init) predict = tf.argmax(self.layer[1] , 1) pred = predict.eval({self.train_data: features.reshape(1 , self.num_input)}) return pred
Thats the class I created so that I can change modify the hyper parameters related to the number of layers and units easily. However my model does not train, as in the results stay pretty much the same after every epoch for a few datasets I tested my model on. (namely mnist and titanic(a stripped down version)).
It would be incredibly helpful if one of you can help me learn about the mistake I made.

How to find the score based on Neural networks classifier in MATLAB?
Can anyone tell me how to get the score of the test sample using neural networks classifier?
Thanks.

python SVM: execution error
I am using Python 3.6 & Windows, and I am learning Python SVM prediction. I got the following code below. However, after running and checking thoroughly I still got an error as below:
File "C:\Users\Lawrence\Anaconda3\lib\sitepackages\sklearn\utils\validation.py", line 614, in column_or_1d raise ValueError("bad input shape {0}".format(shape)) ValueError: bad input shape ()
The original python code as below:
import numpy as np from sklearn import preprocessing from sklearn.svm import SVR input_file = r"C:\Users\Lawrence\Desktop\traffic_data.txt" # Reading the data X = [] count = 0 with open(input_file, 'r') as f: for line in f.readlines(): data = line[:1].split(',') X.append(data) X = np.array(X) # Convert string data to numerical data label_encoder = [] X_encoded = np.empty(X.shape) for i,item in enumerate(X[0]): if item.isdigit(): X_encoded[:, i] = X[:, i] else: label_encoder.append(preprocessing.LabelEncoder()) X_encoded[:, i] = label_encoder[1].fit_transform(X[:, i]) X = X_encoded[:, :1].astype(int) y = X_encoded[:, 1].astype(int) # Build SVR params = {'kernel': 'rbf', 'C': 10.0, 'epsilon': 0.2} regressor = SVR(**params) regressor.fit(X, y) # Cross validation import sklearn.metrics as sm y_pred = regressor.predict(X) print ("Mean absolute error =", round(sm.mean_absolute_error(y, y_pred), 2)) # Testing encoding on single data instance input_data = ['Tuesday', '13:35', 'San Francisco', 'yes'] input_data_encoded = [1] * len(input_data) count = 0 for i,item in enumerate(input_data): if item.isdigit(): input_data_encoded[i] = int(input_data[i]) else: input_data_encoded[i] = int(label_encoder[count].transform(input_data[i])) count = count + 1 input_data_encoded = np.array(input_data_encoded) # Predict and print output for a particular datapoint print ("Predicted traffic:", int(regressor.predict(input_data_encoded)[0]))
The input file data (traffic_data.txt) as below:
Tuesday,00:00,San Francisco,no,3 Tuesday,00:05,San Francisco,no,8 Tuesday,00:10,San Francisco,no,10 Tuesday,00:15,San Francisco,no,6 Tuesday,00:20,San Francisco,no,1 Tuesday,00:25,San Francisco,no,4 Tuesday,00:30,San Francisco,no,9 Tuesday,00:35,San Francisco,no,4 Tuesday,00:40,San Francisco,no,6 Tuesday,00:45,San Francisco,no,13 Tuesday,00:50,San Francisco,no,5 Tuesday,00:55,San Francisco,no,5 Tuesday,01:00,San Francisco,no,4 Tuesday,01:05,San Francisco,no,7 Tuesday,01:10,San Francisco,no,5 Tuesday,01:15,San Francisco,no,4 Tuesday,01:20,San Francisco,no,5 Tuesday,01:25,San Francisco,no,1 Tuesday,01:30,San Francisco,no,8 Tuesday,01:35,San Francisco,no,2 Tuesday,01:40,San Francisco,no,3 Tuesday,01:45,San Francisco,no,0 Tuesday,01:50,San Francisco,no,2 Tuesday,01:55,San Francisco,no,1 Tuesday,02:00,San Francisco,no,1 Tuesday,02:05,San Francisco,no,0 Tuesday,02:10,San Francisco,no,2 Tuesday,02:15,San Francisco,no,1 Tuesday,02:20,San Francisco,no,2 Tuesday,02:25,San Francisco,no,4 Tuesday,02:30,San Francisco,no,0 Tuesday,02:35,San Francisco,no,0 Tuesday,02:40,San Francisco,no,0 Tuesday,02:45,San Francisco,no,3 Tuesday,02:50,San Francisco,no,1 Tuesday,02:55,San Francisco,no,0 Tuesday,03:00,San Francisco,no,3 Tuesday,03:05,San Francisco,no,0 Tuesday,03:10,San Francisco,no,3 Tuesday,03:15,San Francisco,no,0 Tuesday,03:20,San Francisco,no,0 Tuesday,03:25,San Francisco,no,2 Tuesday,03:30,San Francisco,no,1 Tuesday,03:35,San Francisco,no,1 Tuesday,03:40,San Francisco,no,1 Tuesday,03:45,San Francisco,no,1 Tuesday,03:50,San Francisco,no,0 Tuesday,03:55,San Francisco,no,3 Tuesday,04:00,San Francisco,no,1 Tuesday,04:05,San Francisco,no,2 Tuesday,04:10,San Francisco,no,1 Tuesday,04:15,San Francisco,no,1 Tuesday,04:20,San Francisco,no,2 Tuesday,04:25,San Francisco,no,1 Tuesday,04:30,San Francisco,no,2 Tuesday,04:35,San Francisco,no,2 Tuesday,04:40,San Francisco,no,5 Tuesday,04:45,San Francisco,no,2 Tuesday,04:50,San Francisco,no,5 Tuesday,04:55,San Francisco,no,4 Tuesday,05:00,San Francisco,no,6 Tuesday,05:05,San Francisco,no,5 Tuesday,05:10,San Francisco,no,5 Tuesday,05:15,San Francisco,no,7 Tuesday,05:20,San Francisco,no,4 Tuesday,05:25,San Francisco,no,5 Tuesday,05:30,San Francisco,no,12 Tuesday,05:35,San Francisco,no,12 Tuesday,05:40,San Francisco,no,11 Tuesday,05:45,San Francisco,no,12 Tuesday,05:50,San Francisco,no,11 Tuesday,05:55,San Francisco,no,13 Tuesday,06:00,San Francisco,no,19 Tuesday,06:05,San Francisco,no,16 Tuesday,06:10,San Francisco,no,19 Tuesday,06:15,San Francisco,no,15 Tuesday,06:20,San Francisco,no,8 Tuesday,06:25,San Francisco,no,14 Tuesday,06:30,San Francisco,no,30 Tuesday,06:35,San Francisco,no,35 Tuesday,06:40,San Francisco,no,20 Tuesday,06:45,San Francisco,no,27 Tuesday,06:50,San Francisco,no,33 Tuesday,06:55,San Francisco,no,24 Tuesday,07:00,San Francisco,no,39 Tuesday,07:05,San Francisco,no,42 Tuesday,07:10,San Francisco,no,36 Tuesday,07:15,San Francisco,no,50 Tuesday,07:20,San Francisco,no,42 Tuesday,07:25,San Francisco,no,38 Tuesday,07:30,San Francisco,no,38 Tuesday,07:35,San Francisco,no,40 Tuesday,07:40,San Francisco,no,49 Tuesday,07:45,San Francisco,no,39 Tuesday,07:50,San Francisco,no,43 Tuesday,07:55,San Francisco,no,44 Tuesday,08:00,San Francisco,no,40 Tuesday,08:05,San Francisco,no,22 Tuesday,08:10,San Francisco,no,25 Tuesday,08:15,San Francisco,no,42 Tuesday,08:20,San Francisco,no,37 Tuesday,08:25,San Francisco,no,36 Tuesday,08:30,San Francisco,no,34 Tuesday,08:35,San Francisco,no,41 Tuesday,08:40,San Francisco,no,37 Tuesday,08:45,San Francisco,no,36 Tuesday,08:50,San Francisco,no,40 Tuesday,08:55,San Francisco,no,37 Tuesday,09:00,San Francisco,no,41 Tuesday,09:05,San Francisco,no,38 Tuesday,09:10,San Francisco,no,36 Tuesday,09:15,San Francisco,no,44 Tuesday,09:20,San Francisco,no,33 Tuesday,09:25,San Francisco,no,30 Tuesday,09:30,San Francisco,no,41 Tuesday,09:35,San Francisco,no,36 Tuesday,09:40,San Francisco,no,35 Tuesday,09:45,San Francisco,no,36 Tuesday,09:50,San Francisco,no,35 Tuesday,09:55,San Francisco,no,42 Tuesday,10:00,San Francisco,no,31 Tuesday,10:05,San Francisco,no,25 Tuesday,10:10,San Francisco,no,28 Tuesday,10:15,San Francisco,no,27 Tuesday,10:20,San Francisco,no,23 Tuesday,10:25,San Francisco,no,25
Hopefully somebody can solve this problem.

Realtime image classification using Deep Learning
I am working on an interesting project to classification realtime image taken from an infrared camera. I want to use Deep Learning technologies, but I do not have enough experience on image classification using DL.
That is why I cannot make a decision how and which technologies of DL I should apply / use. So, could you please just tell me which algorithm of DL I should use.
My realtime image classification should be as follow.
I am trying to develop a solution that will classify time series image data to detect abnormality or anomaly of a running machine in a factory.
In this case, an infrared camera will take 2 or 3 pictures in every second of a running machine, continuously. As I am planning to use Deep Learning, so what I think, I should take thousands of pictures of normal running machine state and classify it as 1, meaning normal state of machine.
I also need to take many picture of abnormal running state of the machine, and classify it as 0, meaning anomaly. In this way, I will train my system with thousands of images of normal and abnormal state of the machine. After that I will take real time infrared image of the running machine and these realtime images will classify (1, or 0) based on the training images.
In the net, I find that there are many image classification algorithm and open source. But I did not find exactly what kind of algorithm or tools I should use to classify time series image data.
If you have any idea, please let me know.
I am a little confused. Should I use only CNN, or I should use CNN and LSTM or something other? Please tell me what you think.
Thank you very much for your kind help.

How to find the best weights for my agent player
Firstly, i'd like to apologize for my bad english, but i'll try to explain my problem the best way possible and Thank you for reading!
Im creating an agent to play a board game. The agent is very simple, using only the alphabeta search algortihm with iterative deepening. The main problem is about my heuristic evaluation function. Considering available moves, opponent moves and center distance as features, I want to find a good set of weights for them. The problem is right there: how to find these optimal weights. Some key points that im stuck at is: what feature should I weight down and what should I increase its weight?
I know this leads to a reinforcement learning problem or a cost function (games lost) minimizer to solve this, but I dont know where to start. I appreciate any help!
Thank you again for your attention!

How to find accuracy for logistic regression and gradient descent with training and validation data sets?
I am trying to implement logistic regression with gradient descent on the notMNIST dataset. This is my code thus far, which parses the data and plots the accuracy against the epochs. I have done my training in 7 mini batches of 500 each. There are a total of 5000 iterations and therefore 5000/7 epochs. My goal is to find the accuracy after each epoch and plot it against the epoch. And I want to do the same with the average loss at each epoch. I want to do this for the validation points.
This is the loss function I am implementing.
However, for some reason, when I try to calculate accuracy I always get 100%, which doesn't make sense since I am finding the weight from the training and then using it on the validation set, so the algorithm cannot be correct 100% of the time. Also when I plot the losses, I get a linear function, which also doesn't make any sense.
Does anyone have ideas about what I am doing wrong? Any help would be appreciated!
#implement logistic regression #logistic regression prediction function is y = sigmoid(W^Tx + b) #train the logistic regression model using SGD and mini batch size B = 500 on the twoclass notNMIST dataset #how to train the dataset: import tensorflow as tf import numpy as np import matplotlib.pyplot as plt ##############Constants################################## BATCH_SIZE = 500; NUM_BATCHES = 7; NUM_ITERATIONS = 5000; LEARNING_RATE = [0.005]#0.001, 0.0001]; PIXEL_SIZE = 784; #28x28 NUM_TRAINING_POINTS = 3500; NUM_VALID_POINTS = 100; ###############Extracting data############################ with np.load("notMNIST.npz") as data : Data, Target = data ["images"], data["labels"] posClass = 2 negClass = 9 dataIndx = (Target==posClass) + (Target==negClass) Data = Data[dataIndx]/255. Target = Target[dataIndx].reshape(1, 1) Target[Target==posClass] = 1 Target[Target==negClass] = 0 np.random.seed(521) randIndx = np.arange(len(Data)) np.random.shuffle(randIndx) Data, Target = Data[randIndx], Target[randIndx] trainData, trainTarget = Data[:3500], Target[:3500] validData, validTarget = Data[3500:3600], Target[3500:3600] testData, testTarget = Data[3600:], Target[3600:] ################Manipulating Data########################## trainX = np.reshape(trainData, (NUM_TRAINING_POINTS, PIXEL_SIZE)); validX = np.reshape(validData, (NUM_VALID_POINTS, PIXEL_SIZE)) batchesX = np.array(np.split(trainX, NUM_BATCHES)); batchesY = np.array(np.split(trainTarget, NUM_BATCHES)); ################Defining variables######################## loss_Values = [[0 for x in range(NUM_BATCHES)] for y in range(715)] lr = dict() epoch_list = [] mean_list = [] accuracy_list = [] x = tf.placeholder(tf.float32, [PIXEL_SIZE, None], name = "input_points") #784 dimensions (28x28 pixels) W = tf.Variable(tf.truncated_normal(shape=[PIXEL_SIZE,1], stddev=0.5), name='weights') b = tf.Variable(0.0, name='bias') y = tf.placeholder(tf.float32, [None,1], name = "target_labels")#target labels lambda_ = 0.01 ##############Calculations############################### #weight_squared_sum = tf.matmul(tf.transpose(W),W) #find the square of the weight vector #calculating the bias term with tf.Session() as sess: tf.global_variables_initializer().run() weight = W.eval() weight_squared_sum = np.linalg.norm(weight) loss_W = lambda_ /2 * weight_squared_sum #find the loss y_hat = tf.add(tf.matmul(tf.transpose(W), x), b) #based on the sigmoid equation y_hat = tf.transpose(y_hat) cross_entropy = tf.nn.sigmoid_cross_entropy_with_logits(logits = y_hat, labels = y) #sigmoid_cross_entropy_with_logits takes in the actual y and the predicted y total_loss = tf.add(tf.reduce_mean(cross_entropy,0),loss_W) #############Training###################################### epoch = 0 with tf.Session() as sess: epoch = 0; tf.global_variables_initializer().run() for learning_rate in LEARNING_RATE: train_step = tf.train.GradientDescentOptimizer(learning_rate).minimize(total_loss) #change the learning rate each time for i in range(NUM_BATCHES*NUM_ITERATIONS): sess.run(train_step, feed_dict={x:np.transpose(batchesX[i%NUM_BATCHES]), y: batchesY[i%NUM_BATCHES]}) print("i: ",i) print("LOSS:") print(sess.run(total_loss, feed_dict={x:np.transpose(batchesX[i%NUM_BATCHES]), y: batchesY[i%NUM_BATCHES]})) if( i % NUM_BATCHES == 0): #everytime we reach 0, a new epoch has started loss_Values[epoch][i%NUM_BATCHES] = sess.run(cross_entropy, feed_dict={x: np.transpose(batchesX[i%NUM_BATCHES]) , y: batchesY[i%NUM_BATCHES]}); correct_prediction = tf.equal(y, y_hat) accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32)) accuracy_val = sess.run(accuracy, feed_dict={x: np.transpose(validX) , y: validTarget}) print("Accuracy: ", accuracy_val) accuracy_list.append(accuracy_val) epoch = epoch + 1; lr[learning_rate] = loss_Values; print("Final value") #for plotting purposes N = len(loss_Values) for epoch in range (N): #find average over all input points in one epoch epoch_list.append(epoch) row = np.array(loss_Values[epoch]) mean = np.add.reduce(row) / 3500; mean_list.append(mean) epoch_list = np.array(epoch_list) mean_list = np.array(epoch_list) accuracy_list = np.array(epoch_list) plt.figure() plt.plot(epoch_list, accuracy_list, '', label = 'Average loss') plt.show()

Implementing a naive gradient descent check using numpy
I've been trying to solve the assignment published by Stanford which asks us to implement a naive gradient check as part of one of the helpers to implement a neural network from scratch. While my solution does work, comparing it to the solutions available online, I found a sharp discrepancy in how it is being implemented. I am pretty new to numpy, and I've been breaking my head over why the solutions online are using a roundabout method! Could someone point me towards what I'm missing?
This is the standard implentation found online  https://github.com/kingtaurus/cs224d/blob/10ad33f6bafeeaacae456fc48ef530edbfe5444a/assignment1/solutions/assignment1_solutions.tex#L598
My implementation looks like this 
def gradcheck_naive(f, x): rndstate = random.getstate() random.setstate(rndstate) fx, grad = f(x) # Evaluate function value at original point h = 1e4 # Do not change this! # Iterate over all indexes ix in x to check the gradient. it = np.nditer(x, flags=['multi_index'], op_flags=['readwrite']) while not it.finished: ix = it.multi_index random.setstate(rndstate) forwards, grad_for = f(x[ix] + h) random.setstate(rndstate) backwards, bac_for = f(x[ix]  h) numgrad = ((forwards  backwards)/ (2*h)) reldiff = abs(numgrad  grad[ix]) / max(1, abs(numgrad), abs(grad[ix])) if reldiff > 1e5: print "Gradient check failed." print "First gradient error found at index %s" % str(ix) print "Your gradient: %f \t Numerical gradient: %f" % ( grad[ix], numgrad) return it.iternext() # Step to next dimension print "Gradient check passed!"
This works and is so much cleaner than the git solution. So why has the git solution saved old_xix and called the function on the entire x even though only x[ix] was the only element modified? Am I missing something?