scipy.optimize.minimize cost function of neural networks fails
I have written the cost function of a neural network for image classification problem. To optimize my cost function i have used scipy.optimize.minimize with solver ='TNC' though their isn't any run time error but after complication of optimization, result says success = 'false'
Here is my cost function
def costandGrad(nn_param,X,Y_matrix,lam,i,j):
Theta1,Theta2 = reshape(nn_param,i,j)
Theta1 = np.matrix(Theta1)
Theta2 = np.matrix(Theta2)
#part1  IMPLEMENTING FEEDFORWARD PROPOGATION ALGORTHIM
a1 = X
Z2 = X*np.transpose(Theta1)
a2 = sigmoid(Z2)
a2 = np.hstack((np.ones([len(a2),1]),a2))
Z3 = a2*np.transpose(Theta2)
a3 = sigmoid(Z3)
Hyp = a3
#PART2  IMPLEMENTING COST WITH REGULARISATION
first_term = np.trace((np.transpose(Y_matrix))* (np.log(Hyp)));
second_term = np.trace(((1(np.transpose(Y_matrix)))*(np.log(1Hyp))));
JU = first_term + second_term
sq1 = (np.square(Theta1[:,1:]))
sq2 = (np.square(Theta2[:,1:]))
regularization = (lam/(2*m)) * (sum(np.transpose(sum(sq1))) + sum(np.transpose(sum(sq2))))
J = JU + regularization
#PART3  BACKPROPOGATION ALGORTHIM
D3 = Hyp  Y_matrix
D2 = D3*Theta2[:,1:]
D2 = np.multiply(D2,sigmoidGrdadient(Z2))
Delta1 = np.transpose(D2)*X;
Delta2 = np.transpose(D3)*a2;
Theta1_Grad = 1/m*(Delta1)
Theta2_Grad = 1/m*Delta2
nn_parameters = np.concatenate((np.array(Theta1).ravel(), np.array(Theta2).ravel()))
return J,nn_parameters
here is the result
fim = sc.optimize.minimize(fun=costandGrad, x0=nn_params, args=(X_train, Y_matrix, 3,i,j), method='TNC', jac=True, options={'maxiter': 2000})
fun: matrix([[227085.02475254]])
jac: array([ 0.08245473, 0.00159231, 0.47998975, ..., 1.12524447,
1.30664152, 1.14908335])
message: 'Linear search failed'
nfev: 50
nit: 0
status: 4
success: False
x: array([ 0.08245473, 0.00159231, 0.47998975, ..., 1.12524447,
1.30664152, 1.14908335])enter code here
See also questions close to this topic

Connecting to API issues
I am new to Python and am trying to use it to connect to Americommerce API. They have an example on git. It doesn't seem to work though. I add the url to STORE_DOMAIN, key to ACCESS_TOKEN, and app id to STORE_ID.
The only one that seems to matter is the web url. If I mispell it I get coonection errors, however the key and id I can enter whatever I want there and get the same results. here is the script
#!/usr/bin/env python3 # The 'requests' module is available via pip: "pip install requests" # You can find more documentation about requests at http://docs.pythonrequests.org/en/latest/ import requests import json import locale import sys STORE_DOMAIN = "http://www.example.com" ACCESS_TOKEN = "key" STORE_ID = "app_id" # This should reflect your store's ID locale.setlocale(locale.LC_ALL, '') # Searches for and returns the customer that matches the info passed in, # if no customer is found a new one is created and returned def get_customer(firstName, lastName, email): # setup headers headers = { 'XACAuthToken': ACCESS_TOKEN, 'ContentType': 'application/json' } # build API call to check if customer already exists (using email) uri = '{}/api/v1/customers?email={}'.format(STORE_DOMAIN, email) # include verify=False if using dev certificate # r = requests.get(uri, headers = headers, verify=False) r = requests.get(uri, headers = headers) # see if a customer was found customer = r.json() if (customer['total_count'] > 0): return customer['customers'][0] # no customer found, so lets create a new one data = { 'last_name': doe, 'first_name': john, 'email': johndoe@email.com, 'store_id': 4 } # build API call to post a new customer uri = '{}/api/v1/customers'.format(STORE_DOMAIN) # include verify=False if using dev certificate # r = requests.post(uri, headers=headers, data = json.dumps(data), verify=False) r = requests.post(uri, headers=headers, data = json.dumps(data)) # return newly created customer return r.json() if __name__ == '__main__': customer = get_customer('John', 'Doe', 'JohnDoe@email.com') first_name = customer['first_name'].encode('utf8') last_name = ',{}\n'.format(customer['last_name']).encode('utf8') email = customer['email'].encode('utf8') sys.stdout.buffer.write(first_name + last_name + email)
I get this result when I run this:
Traceback (most recent call last): File "./getStuff.py", line 57, in <module> customer = get_customer('John', 'Doe', 'JohnDoe@email.com') File "./getStuff.py", line 35, in get_customer if (customer['total_count'] > 0): KeyError: 'total_count'
Please, a little help here would be appreciated. Ive been stuck on this for a bit now.

Where to put tab completion configuration in python?
python tab completion Mac OSX 10.7 (Lion)
The above link shows that the following code can be used for autocompletion in python.
import readline import rlcompleter if 'libedit' in readline.__doc__: readline.parse_and_bind("bind ^I rl_complete") else: readline.parse_and_bind("tab: complete")
But I don't see where to put it so that it can be loaded at startup. I tried
~/.pythonrc
, but it did not work.Does anybody know what is the current way to load such a configuration automatically for an interactive python session?

Changing a predictor input to Linear Discriminant Analysis (LDA) and comparing the output
I've been running Linear Disriminant Analysis (LDA) through python (using the method from Sebastian Raschka).
My data is related to galaxies  I've been trying to classify galaxies as either nonmerging or merging based upon a set of predictors (9 total). I've been varying the measurement of one of the predictors and rerunning LDA.
Is this a statistically legitimate thing to do? I was also wondering how to compare the runs of LDA and answer the question: are they different if I change the way I measure this one predictor?
My first discriminant component (PCA1 or LDA1 in this case) has a much larger eigenvalue than all others for all runs, so one thing I've been trying is comparing eigenvector 1 between the different runs of LDA. The ordering of the predictors changes as does their relative values. I guess I should mention that I've also normalized all the predictors prior to LDA.
So is it legitimate to directly compare the eigenvector 1 values for each predictor and if they change in value or order this indicates that the different measurement of the predictor is significant? How much of a change would be necessary to call this 'significant'?
Here is an example of eigenvectors of two runs where I change the 2nd to last input predictor:
Eigenvector 1 for Model1: [[0.092 ] [0.0066] [ 0.4605] [ 0.7649] [0.3991] [ 0.1067] [ 0.0826] [0.126 ] [ 0.0314]] Eigenvalue 1: 2.29e+00 Eigenvector 1 for Model2: [[ 0.0824] [0.0012] [0.3643] [0.7967] [ 0.4032] [0.0449] [0.0769] [0.0297] [ 0.2331]] Eigenvalue 1: 2.70e+00
Thank you for your input!

Powershell: Optimizing script arranging high volume of files (<2MB) into folders in order
This script is performed on 200,000+ files of about 2MB. The files need to be arranged in folders with a max count of $max. Then they will be zipped. Right now, the code below works, but can be slow. What suggestions for optimizing this script do I have? I'm not sure if regex is the way to go for sorting. Any suggestions for any optimizations are welcome. Thank you!
$src = "C:\srcPath" $dst = "C:\dstPath" SetAlias 7z "C:\Path\7za.exe" $max = 1000 $Container = "Files" $ZipFileName = "Files.zip" $fileFilter = "*.pdf" $fileCount = (dir $src $fileFilter measure).Count if ([int]$fileCount gt 0) { md $dst\$Container $folderCount = [Math]::Ceiling([int]$fileCount / [int]$max) for ($n = 1; $n le $folderCount; $n++) { if(!(testpath $dst\$Container\$n)){ md $dst\$Container\$n GetChildItem Path $src\* File Include $fileFilter  SortObject { [regex]::Replace($_.Name,'\d+',{ $args[0].Value.PadLeft(20) }) }  SelectObject First $max  CopyItem Destination $dst\$Container\$n  OutNull } } 7z a $dst\$ZipFileName $dst\$Container  OutNull }

Set Trailing Zeros to X FAST
I have a 2D array of integers. There may be 0s in the middle of each row, and each row ends with some number of trailing 0s.
How do I set all the trailing zeros to some integer X?
import numpy as np def generateTestData(N, K, INDEX_SIZE): # Start with a flat array to easily place zeros inside data1 = np.random.randint(0, INDEX_SIZE, N*K) # Add zeros at random locations idx = np.random.randint(0, N*K, int(N*K/3)) data1[idx] = 0 # Make data1 a (N,K) array data1 = np.reshape(data1, (N,K)) # Add trailing zeros for i in range(N): data1[i,np.random.randint(0,K):] = 0 return data1 if __name__=='__main__': N = 10000; K = 150; INDEX_SIZE = 500; X = 1 # Test data data1 = generateTestData(N, K, INDEX_SIZE) # Save a copy for the test data2 = np.copy(data1) for i in range(N): for j in reversed(range(K)): if data1[i,j] == 0: data1[i,j] = X else: break # Faster code here on 'data2' # ... def diff(a,b): return np.mean(np.abs(ab)) # Verification: print('Diff(data1,data2) = '+str(diff(data2,data1)))

What methods of hooking are available to me on windows
I'd like a full list of hooking methods available on windows as well as their pros and cons to see how they compare. To start I'll list the ones I already know of:
 WriteProcessMemory()
 DLL Injection
Please answer and add to this list even if you only have one more method to contribute, any extra information and/or an example of their ideal usage situation is appreciated.

Normalize sparse row probability matrix
I've got a sparse matrix with a few elements. Now I would like to row normalize it. However, when I do this, it gets converted to a numpy array, which is not acceptable from a performance standpoint.
To make things more concrete, consider the following example:
x = csr_matrix([[0, 1, 1], [2, 3, 0]]) # sparse normalization = x.sum(axis=1) # dense, this is OK x / normalization # this is dense, not OK, can be huge
Is there an elegant way to do this without having to resort to for loops?
EDIT
Yes, this can be done using
sklearn.preprocessing.normalize
using 'l1' normalization, however, I have no wish to depend on sklearn. 
calculate binomial distribution probability matrix with python
Give the
N
andP
, I want to get a 2D binomial distribution probability matrixM
,for i in range(1, N+1): for j in range(i+1): M[i,j] = choose(i, j) * p**j * (1p)**(ij) other value = 0
I want to know is there any fast way to get this matrix, instead of the for loop. the
N
may be bigger than 100,000 
Structuring & Testing a Neural Network for Prediction
I am a neural network enthusiast. Still learning, please forgive my ignorance.
Currently, I am developing a project where I am trying to use forecast NHL hockey game outcomes.
So far, I have 66 input variables describing various team statistics, one hidden layer, and 4 output variables (predicted team goals (09), outright winner (expressed binary as a 1 or 2), expected goal differential (9 to +9) and total game goals (015). I have approximately 1200 rows of game data that I am using for training.
This is all within MS Excel, which is the platform I've built my NN inside.
My questions are:
 How many neurons in the hidden layer should there be?
 How can I test and optimize the number of neurons for best predictive ability within Excel?
 How does one decide the best activation function to use? (Currently using a sigmoid function)
 What would the best way to structure the outputs be? (1 for a win, 0 for a loss? Perhaps 1 for a win, 1 for a loss?)
I am having trouble figuring out how to optimize the NN for best results. If someone could provide assistance with regards to how to test for optimization in Excel (correlation between predicted and actual outputs? Regression between? Standard deviation? something entirely different?)
Any help or feedback would be greatly appreciated!

How does batching work in a seq2seq model in pytorch?
I am trying to implement a seq2seq model in Pytorch and I am having some problem with the batching. For example I have a batch of data whose dimensions are
[batch_size, sequence_lengths, encoding_dimension]
where the sequence lengths are different for each example in the batch.
Now, I managed to do the encoding part by padding each element in the batch to the length of the longest sequence.
This way if I give as input to my net a batch with the same shape as said, I get the following outputs:
output, of shape
[batch_size, sequence_lengths, hidden_layer_dimension]
hidden state, of shape
[batch_size, hidden_layer_dimension]
cell state, of shape
[batch_size, hidden_layer_dimension]
Now, from the output, I take for each sequence the last relevant element, that is the element along the
sequence_lengths
dimension corresponding to the last non padded element of the sequence. Thus the final output I get is of shape[batch_size, hidden_layer_dimension]
.But now I have the problem of decoding it from this vector. How do I handle a decoding of sequences of different lengths in the same batch? I tried to google it and found this, but they don't seem to address the problem. I thought of doing element by element for the whole batch, but then I have the problem to pass the initial hidden states, given that the ones from the encoder will be of shape
[batch_size, hidden_layer_dimension]
, while the ones from the decoder will be of shape[1, hidden_layer_dimension]
.Am I missing something? Thanks for the help!

Im having trouble with my TensorFlow code, its accuracy and cost isn't improving
class neural_net_for_noobs: ''' A neural network class that is created with the number of units in each layer [inputLayer <hidden layers> outputLayer] ''' weights = [] biases = [] seed = 5 dev = 0.1 train_x = [] val_x = [] train_y = [] val_y = [] train = [] label = [] rng = np.random.RandomState(seed) def init_encoder(self): ''' Function to initialise the decoder basded on the class labels available in the whole data set ''' self.onehot_encoder = OneHotEncoder(sparse=False) self.onehot_encoded = self.onehot_encoder.fit(self.label) def __init__(self , num_units): ''' the constructor that takes in the number of units in each layer as described above. For example [10 5 2] would mean a neural network with 10 input features , a hidden layer with 5 units and 2 units in the output layer ''' self.num_units = num_units self.num_input = num_units[0] self.num_output = num_units[1] self.num_hidden = num_units[1:1] self.train_data = tf.placeholder(tf.float32 , [None , self.num_input]) self.train_label = tf.placeholder(tf.float32 , [None , self.num_output]) self.initialize_weights_biases(num_units) self.layers() self.cost_function() self.optimizer_function() self.init_function() def dense_to_one_hot(self , labels_dense, num_classes=10): """Convert class labels from scalars to onehot vectors""" temp = self.onehot_encoded.transform(labels_dense.reshape(1,1)) return temp def batch_creator(self , batch_size, dataset_length, dataset_name): """Create batch with random samples and return appropriate format""" batch_mask = self.rng.choice(dataset_length, batch_size) batch_x = eval('self.' + dataset_name + '_x')[[batch_mask]].reshape(1, self.num_input) if dataset_name == 'train': batch_y = self.train_y[batch_mask] batch_y = self.dense_to_one_hot(batch_y , num_classes = max(self.train_y)) return batch_x, batch_y def initialize_weights_biases(self , num_units): ''' Initializes weights and biases to random values ''' for i in range(len(self.num_units)  1): self.weights.append(tf.Variable(tf.random_normal([self.num_units[i] , self.num_units[i+1]] , seed = self.seed , stddev = self.dev))) for i in range(len(self.num_units)  1): self.biases.append(tf.Variable(tf.random_normal([self.num_units[i+1]] , seed = self.seed , stddev = self.dev))) def layers(self): ''' Create the layers and hold them in the list ''' self.layer = [] print(len(self.weights) , len(self.biases)) for i in range(len(self.num_units)  1): if i == 0: self.layer.append(tf.add(tf.matmul(self.train_data , self.weights[i]) , self.biases[i])) self.layer[1] = tf.nn.relu(self.layer[1]) elif i == len(self.num_units)  2: self.layer.append(tf.add(tf.matmul(self.layer[i1] , self.weights[i]) , self.biases[i])) self.layer[1] = tf.nn.softmax(self.layer[1]) break else : self.layer.append(tf.add(tf.matmul(self.layer[i1] , self.weights[i]) , self.biases[i])) self.layer[1] = tf.nn.relu(self.layer[1]) def cost_function(self): ''' defining the cost function ''' self.cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels = self.train_label , logits = self.layer[1])) def optimizer_function(self , learning_rate = 0.01): ''' defining the optimizer ''' self.optimizer = tf.train.AdamOptimizer(learning_rate = learning_rate) self.optimizer = self.optimizer.minimize(self.cost) def init_function(self): ''' defining the init function to be called inside session ''' self.init = tf.global_variables_initializer() def load_train(self , train_x , train_y): ''' loads train data onto the variables ''' self.train_x = train_x self.train_y = train_y def load_val(self , val_x , val_y): ''' loads validation data onto the variables ''' self.val_x = val_x self.val_y = val_y def load_data(self , train , label): ''' loads the complete data. I.e. train + test. This part is kinda inefficient and will be taken out ''' self.train = train self.label = label.reshape(1,1) self.init_encoder() def start_session(self , epochs = 10 , batch_size = 512): ''' training the model happens here. ''' with tf.Session() as sess: sess.run(self.init) pred_temp = tf.equal(tf.argmax(self.layer[1] , 1) , tf.argmax(self.train_label , 1)) total_batch = int(self.train_x.shape[0] / batch_size) for epoch in range(epochs): avg_cost = 0 for i in range(total_batch): batch_x , batch_y = self.batch_creator(batch_size , self.train_x.shape[0] , 'train') _ , c = sess.run([self.optimizer , self.cost] , feed_dict = {self.train_data: batch_x , self.train_label: batch_y}) avg_cost += ((float)(c)/(float)(total_batch)) print ("Epoch " , (epoch + 1) , " cost = " , "{:.5f}".format(avg_cost)) #predictions on validation set accuracy = tf.reduce_mean(tf.cast(pred_temp , "float")) print ("Validation accuracy : " , accuracy.eval({self.train_data: self.val_x.reshape(1,self.num_input) , self.train_label: self.dense_to_one_hot(self.val_y)})) print("Weights : ") print(sess.run(self.weights[1])) print("Bias : ") print(sess.run(self.biases[1])) print("\nTraining Complete !") accuracy = tf.reduce_mean(tf.cast(pred_temp , "float")) print ("Validation accuracy : " , accuracy.eval({self.train_data: self.val_x.reshape(1,self.num_input) , self.train_label: self.dense_to_one_hot(self.val_y)})) print ("Train accuracy : " , accuracy.eval({self.train_data: self.train_x.reshape(1,self.num_input) , self.train_label: self.dense_to_one_hot(self.train_y)})) def predict_for(self , features): ''' predicting for the given set of features happens here ''' pred = None with tf.Session() as sess: sess.run(self.init) predict = tf.argmax(self.layer[1] , 1) pred = predict.eval({self.train_data: features.reshape(1 , self.num_input)}) return pred
Thats the class I created so that I can change modify the hyper parameters related to the number of layers and units easily. However my model does not train, as in the results stay pretty much the same after every epoch for a few datasets I tested my model on. (namely mnist and titanic(a stripped down version)).
It would be incredibly helpful if one of you can help me learn about the mistake I made.