how to reshape text data to be suitable for LSTM model in keras
My shape of data is (87716, 200)
and I want to reshape it in a way I can feed it into LSTM.
I have a code for LSTM Autoencoder
. and the below is the architecture of my mode:
inputs = Input(shape=(SEQUENCE_LEN,VOCAB_SIZE), name="input")
# inputs = Embedding( VOCAB_SIZE, 256, input_length=SEQUENCE_LEN)(input)
encoded = Bidirectional(LSTM(LATENT_SIZE), merge_mode="sum", name="encoder_lstm")(inputs)
decoded = RepeatVector(SEQUENCE_LEN, name="repeater")(encoded)
decoded = LSTM(VOCAB_SIZE, return_sequences=True)(decoded)
autoencoder = Model(inputs, decoded)
autoencoder.compile(optimizer="sgd", loss='mse')
autoencoder.summary()
history = autoencoder.fit(Xtrain, Xtrain,batch_size=BATCH_SIZE,
epochs=NUM_EPOCHS)
And this is the way I have prepared the data to feed to model:
BATCH_SIZE=512
VOCAB_SIZE = 4000
SEQUENCE_LEN = 200
sent_wids = np.zeros((len(parsed_sentences),SEQUENCE_LEN),'int32')
sample_seq_weights = np.zeros((len(parsed_sentences),SEQUENCE_LEN),'float')
for index_sentence in range(len(parsed_sentences)):
temp_sentence = parsed_sentences[index_sentence]
temp_words = nltk.word_tokenize(temp_sentence)
for index_word in range(SEQUENCE_LEN):
if index_word < sent_lens[index_sentence]:
sent_wids[index_sentence,index_word] = lookup_word2id(temp_words[index_word])
else:
sent_wids[index_sentence, index_word] = lookup_word2id('PAD')
print(sent_wids.shape) #(87716, 200)
Xtrain = sent_wids
Now how can I reshape the data in which I can feed it to this model: I am getting this error :
ValueError: Error when checking input: expected input to have 3 dimensions, but got array with shape (87716, 200)
When I do reshaping like this Xtrain = Xtrain.reshape(1,SEQUENCE_LEN, VOCAB_SIZE)
.
What do we need to do if our prepared data is not dividable by the shapes and batch_size we considered?
Please let me know which part is not clear I will explain.
Thanks for your help:)
See also questions close to this topic

comparing two dataframe columns of booleans
I have two data frames each denoting actual rain and predicted rain codition. Actual rain dataframe is constant as it is a known result. Predicted rain dataframe They are given below.
actul = index rain Day1 True Day2 False Day3 True Day4 True
Predicted rain dataframe is given below. This dataframe keeps on changing based on predicted model used.
prdt = index rain Day1 False Day2 True Day3 True Day4 False
I am developing prediction accuracy of above prediction model as given below:
#Following computes the number days on which raining was predicted correctly a = sum(np.where(((actul['rain'] == True)&(prdt['rain']==True)),True,False)) #Following computes the number days on which norain was predicted correctly b = sum(np.where(((actul['rain'] == False)&(prdt['rain']==False)),True,False)) #Following computes the number days on which raining was incorrectly predicted c = sum(np.where(((actul['rain'] == True)&(prdt['rain']==False)),True,False)) #Following computes the number days on which norain was incorrectly predicted d = sum(np.where(((actul['rain'] == False)&(prdt['rain']==True)),True,False)) predt_per = (a+b)*100/(a+b+c+d)
My above code is taking too much time to compute. Is their a better way to achive above result

Why do I see such broken package names, how do I fix it?
There are some packages with '' infront. and some only dash with name being moved to version. I don't know how it has happened. Here is how it looks
$
pip list
Package Version    lenium .lenium 3.0.2 elenium 3.0.2 lenium 3.0.2 selenium 3.141.0

Scrapy only returning 400 http code errors
I am running scrapy on many different websites. No matter what site I choose, scrapy returns one 200 code, and the rest 400, making my crawler useless. Any idea why?
{'downloader/request_bytes': 14495, {'downloader/request_bytes': 14495, 'downloader/request_count': 25, 'downloader/request_count': 25, 'downloader/request_method_count/1': 24, 'downloader/request_method_count/1': 24, 'downloader/request_method_count/GET': 1, 'downloader/request_method_count/GET': 1, 'downloader/response_bytes': 11208, 'downloader/response_bytes': 11208, 'downloader/response_count': 25, 'downloader/response_count': 25, 'downloader/response_status_count/200': 1, 'downloader/response_status_count/200': 1, 'downloader/response_status_count/400': 24, 'downloader/response_status_count/400': 24,
The downloader/response_status_count/400 is always 99.99% of the response. I've tried this on over 100 different sites, always the same issue.

IndexError: too many indices for array. Numpy + Pandas DataFrame
Receiving an 'IndexError: too many indices for array' error.
import numpy as np import pandas as pd from numpy.random import randn rowi = ['A', 'B', 'C', 'D', 'E'] coli = ['W', 'X', 'Y', 'Z'] df = pd.DataFrame(randn[5, 4], rowi, coli) # data , index , col print(df)
I expect the DataFrame to be outputted in an 'Excel' type of fashion but instead, get the index error.

Apply a function to each 2d subarray sliced from a Nd array
Currently, I have a 4d array, say,
arr = np.arange(48).reshape((2,2,3,4))
I want to apply a function that takes a 2d array as input to each 2d array sliced from
arr
. I have searched and read this question, which is exactly what I want.The function I'm using is
im2col_sliding_broadcasting()
which I get from here. It takes a 2d array and list of 2 elements as input and returns a 2d array. In my case: it takes3x4
2d array and a list[2, 2]
and returns4x6
2d array.I considered using
apply_along_axis()
but as said it only accepts1d
function as parameter. I can't applyim2col
function this way.I want an output that has the shape as
2x2x4x6
. Surely I can achieve this with for loop, but I heard that it's too time expensive:import numpy as np def im2col_sliding_broadcasting(A, BSZ, stepsize=1): # source: https://stackoverflow.com/a/30110497/10666066 # Parameters M, N = A.shape col_extent = N  BSZ[1] + 1 row_extent = M  BSZ[0] + 1 # Get Starting block indices start_idx = np.arange(BSZ[0])[:, None]*N + np.arange(BSZ[1]) # Get offsetted indices across the height and width of input array offset_idx = np.arange(row_extent)[:, None]*N + np.arange(col_extent) # Get all actual indices & index into input array for final output return np.take(A, start_idx.ravel()[:, None] + offset_idx.ravel()[::stepsize]) arr = np.arange(48).reshape((2,2,3,4)) output = np.empty([2,2,4,6]) for i in range(2): for j in range(2): temp = im2col_sliding_broadcasting(arr[i, j], [2,2]) output[i, j] = temp
Since my
arr
in fact is a10000x3x64x64
array. So my question is: Is there another way to do this more efficiently ? 
How to add nonzero elements to noise?
I have a numpy array and a noise function.
def gaussian_noise(X,sigma=0.1): noise = np.random.normal(0, sigma, X.shape) return X + noise
How to add some noise to non zero element? For example:
# input an array a = array([[1, 0, 3], [2, 5, 0], [0, 0, 7]]) b = gaussian_noise(a)
Output:
b = array([[ 0.83781175, 0., 2.99969046], [ 1.92693919, 4.85350012, 0.], [0., 0., 7.04896986]])
How can I modify my function?

How to specify number of layers in keras?
I'm trying to define a fully connected neural network in keras using tensorflow backend, I have a sample code but I dont know what it means.
model = Sequential() model.add(Dense(10, input_dim=x.shape[1], kernel_initializer='normal', activation='relu')) model.add(Dense(50, input_dim=x.shape[1], kernel_initializer='normal', activation='relu')) model.add(Dense(20, input_dim=x.shape[1], kernel_initializer='normal', activation='relu')) model.add(Dense(10, input_dim=x.shape[1], kernel_initializer='normal', activation='relu')) model.add(Dense(1, kernel_initializer='normal')) model.add(Dense(y.shape[1],activation='softmax'))
From the above code I want to know what is the number of inputs to my network, number of outputs, number of hidden layers. And what is the number coming after model.add(Dense ? assuming x.shape[1]=60. What is the name of this network exacly? Should I call it a fully connected network or convolutional network?

Keras: Multiple outputs, loss only a function of one?
I have a setup like this:
model = keras.Model(input,[output1,output2])
My loss function is only a function of output1. How do I tell Keras to ignore output2 for the purposes of computing loss? The best I have come up with is to generate a bogus loss function which always returns 0.0:
model.compile(optimizer=..., loss=[realLossFunction, zeroLossFunction])
I can live with this, but I have to see the statistics and progress of this loss function all over the place and would like to know if there is a more elegant way.

Iterating over arrays on disk similar to ImageDataGenerator
I have 70'000 2D numpy arrays on which I would like to train a CNN network using Keras. Holding them in memory would be an option but would consume a lot of memory. Thus, I would like to save the matrices on disk and load them on runtime. One option would be to use
ImageDataGenerator
. The problem is that it only can read images.I would like to store the arrays not as images because when I would save them as (grayscale) images then the values of arrays are changed (normalized etc.). But in the end I would like to feed the original matrices into the network and not changed values due to saving as image.
Is it possible to somehow store the arrays on disk and iterate over them in a similar way as
ImageDataGenerator
does?Or else can I save the arrays as images without changing the values of the arrays?

Is there any paper about vanishinggradients of LSTM?
Some web pages mentioned that LSTM causes the vanishing or exploding gradients if the sequence is too long.
These are one of the pages:
 https://machinelearningmastery.com/handlelongsequenceslongshorttermmemoryrecurrentneuralnetworks/
 How to handle extremely long LSTM sequence length?
However, I couldn't find any paper or formulation for it.
Could you please tell me the references for this problem? 
Pytorch LSTM vs LSTMCell
What is the difference between LSTM and LSTMCell in Pytorch (currently version 1.1)? It seems that LSTMCell is a special case of LSTM (i.e. with only one layer, unidirectional, no dropout).
Then, what's the purpose of having both implementations? Unless I'm missing something, it's trivial to use an LSTM object as an LSTMCell (or alternatively, it's pretty easy to use multiple LSTMCells to create the LSTM object)

How to use attention layer on a sequence labeling task implemented using LSTMs in tensorflow?
I've used LSTMCell in tensorflow to implement a sequence labeling task. My code is based on this example code written by Aymeric Damien. Here is important parts of the code (full code is here):
# tf Graph input x = tf.placeholder("float", [None, seq_max_len, 1]) y = tf.placeholder("float", [None, n_classes]) # A placeholder for indicating each sequence length seqlen = tf.placeholder(tf.int32, [None]) # Define weights weights = { 'out': tf.Variable(tf.random_normal([n_hidden, n_classes])) } biases = { 'out': tf.Variable(tf.random_normal([n_classes])) } def dynamicRNN(x, seqlen, weights, biases): # Prepare data shape to match `rnn` function requirements # Current data input shape: (batch_size, n_steps, n_input) # Required shape: 'n_steps' tensors list of shape (batch_size, n_input) # Unstack to get a list of 'n_steps' tensors of shape (batch_size, n_input) x = tf.unstack(x, seq_max_len, 1) # Define a lstm cell with tensorflow lstm_cell = tf.contrib.rnn.BasicLSTMCell(n_hidden) # Get lstm cell output, providing 'sequence_length' will perform dynamic # calculation. outputs, states = tf.contrib.rnn.static_rnn(lstm_cell, x, dtype=tf.float32, sequence_length=seqlen) # When performing dynamic calculation, we must retrieve the last # dynamically computed output, i.e., if a sequence length is 10, we need # to retrieve the 10th output. # However TensorFlow doesn't support advanced indexing yet, so we build # a custom op that for each sample in batch size, get its length and # get the corresponding relevant output. # 'outputs' is a list of output at every timestep, we pack them in a Tensor # and change back dimension to [batch_size, n_step, n_input] outputs = tf.stack(outputs) outputs = tf.transpose(outputs, [1, 0, 2]) # Hack to build the indexing and retrieve the right output. batch_size = tf.shape(outputs)[0] # Start indices for each sample index = tf.range(0, batch_size) * seq_max_len + (seqlen  1) # Indexing outputs = tf.gather(tf.reshape(outputs, [1, n_hidden]), index) # Linear activation, using outputs computed above return tf.matmul(outputs, weights['out']) + biases['out'] pred = dynamicRNN(x, seqlen, weights, biases) # Define loss and optimizer cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=pred, labels=y)) optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate).minimize(cost) # Evaluate model correct_pred = tf.equal(tf.argmax(pred,1), tf.argmax(y,1)) accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32)) # Initialize the variables (i.e. assign their default value) init = tf.global_variables_initializer() # Start training with tf.Session() as sess: # Run the initializer sess.run(init) for step in range(1, training_steps + 1): batch_x, batch_y, batch_seqlen = trainset.next(batch_size) # Run optimization op (backprop) sess.run(optimizer, feed_dict={x: batch_x, y: batch_y, seqlen: batch_seqlen}) if step % display_step == 0 or step == 1: # Calculate batch accuracy & loss acc, loss = sess.run([accuracy, cost], feed_dict={x: batch_x, y: batch_y, seqlen: batch_seqlen}) print("Step " + str(step*batch_size) + ", Minibatch Loss= " + \ "{:.6f}".format(loss) + ", Training Accuracy= " + \ "{:.5f}".format(acc)) print("Optimization Finished!") # Calculate accuracy test_data = testset.data test_label = testset.labels test_seqlen = testset.seqlen print("Testing Accuracy:", \ sess.run(accuracy, feed_dict={x: test_data, y: test_label, seqlen: test_seqlen}))
So, it's a multi input and single output architecture. I'd like to add an attention layer to this model. In fact, I'd like to treat vectors at different time steps differently (using different weights). Any idea on how to implement this in TF or where to start is appreciated. I completely understand this code, but I"m new in using attentions, specially using them on a multi input single output RNN.
Thanks!

Image Autoencoder c++ including dataset
I'm going to build Image autoencoder after I Finished neural network course and I need to enter my dataset as a set of pixels and I don't know how to do that I searched a lot and I found different c++ libraries and some of them doesn't work with my codeblock is there any tutorial can help me to do that with c++ !!!

Can EncoderDecoder network be used for different input and output?
Working on image translation problem. Got many pairs of inputoutput images, say sketch as input, translated sketch as output. Images are b&w with 1 pixel width sketch lines.
Can simple encoderdecoder be used to LEARN the image translation?
Code snippet below is from https://blog.keras.io/buildingautoencodersinkeras.html which shows how autoencoder is programmed. Obviously, being autoencoder, the input and output are both shown same.
autoencoder.fit(x_train, x_train, epochs=50, batch_size=256, shuffle=True, validation_data=(x_test, x_test))
But here, instead of "x_train, xtrain" as first two arguments, can I give "x_train, y_train", where x_train are input images and y_train are output images?
Is is theoretically correct? Will following optimizer and cost function work?
autoencoder.compile(optimizer='adadelta', loss='binary_crossentropy')
Generally Pix2Pix networks are used for such work. But the main tenet of the GANs is that they learn the Cost Function for what is a good output.
In my problem, cost function is very deterministic, not a pixel here and there would do. So error can be clearly defined.
Is it theoretically correct even to try Pix2Pix here?

MSE is very low while training and validating Autoencoder but model does't know both training and validation data
I am using autoencoder to reconstruct some matrix (movielens like data), I tried different loss functions (custom and in keras) non of them solves my issue, the problem is that the model gives very low error while training and validating but it doesn't know anything even if I test it with training data! (After it failed with test data I made sure if it knows the training data and it doesn't).
I tried anything I found online to solve this, I rewrite the same model in different ways.
Epoch 1/100 17560/17560 [==============================]  7s 411us/step  loss: 0.0032  mean_squared_error: 0.0032 Epoch 2/100 17560/17560 [==============================]  6s 358us/step  loss: 1.0405e04  mean_squared_error: 1.0405e04 Epoch 3/100 17560/17560 [==============================]  6s 356us/step  loss: 1.0057e04  mean_squared_error: 1.0057e04 Epoch 4/100 17560/17560 [==============================]  6s 358us/step  loss: 9.9528e05  mean_squared_error: 9.9528e05 Epoch 5/100 17560/17560 [==============================]  6s 357us/step  loss: 9.8787e05  mean_squared_error: 9.8787e05 Epoch 6/100 17560/17560 [==============================]  6s 363us/step  loss: 9.7531e05  mean_squared_error: 9.7531e05 Epoch 7/100 17560/17560 [==============================]  6s 366us/step  loss: 9.6862e05  mean_squared_error: 9.6862e05 Epoch 8/100 17560/17560 [==============================]  6s 362us/step  loss: 9.7063e05  mean_squared_error: 9.7063e05 Epoch 9/100 17560/17560 [==============================]  6s 366us/step  loss: 9.5844e05  mean_squared_error: 9.5844e05 Epoch 10/100 17560/17560 [==============================]  6s 362us/step  loss: 9.4295e05  mean_squared_error: 9.4295e05 Epoch 11/100 17560/17560 [==============================]  6s 365us/step  loss: 9.4820e05  mean_squared_error: 9.4820e05 Epoch 12/100 17560/17560 [==============================]  6s 363us/step  loss: 9.3577e05  mean_squared_error: 9.3577e05 Epoch 13/100 ```
If row in the matrix is [37.92577771, 1. , 2.] Ecceptable reconstruction is [35.92577771, 1.5. , 1.99]
Result is [8.379196 , 8.420334 , 5.801697]