text generation using Estimator API
I have been trying to start transitioning to Estimator API since it is recommended by tensorflow people. However, I wonder how some of the basic stuff can be done efficiently with estimator framework. During weekend I tried to create GRU based model for text generation and followed tensorflow example for building custom estimators. I have been able to create a model that I can train relatively easily and its results match with nonestimator version. However for sampling (generating text) I faced with some trouble. I finally made it work but it is very slow, since every time it predict a character, the estimator framework load the whole graph which makes the whole thing slow. Is there a way to not load the graph every time or any other solution? Second issue: I also had to use state_is_tuple=False since I have to send back and forth the state of GRU (between model method and generator method) and I can't send tuples. Any one knows how to deal with this? thanks P.S. Here is a link to my code example: https://github.com/amirharati/sample_estimator_charlm/blob/master/RnnLm.py
See also questions close to this topic

Keras InvalidArgumentError: Incompatible shapes: [1,8,32] vs. [1,10,32]
It stops training at the beginning of the first epoch.
The error is:
InvalidArgumentError: Incompatible shapes: [1,8,32] vs. [1,10,32] [[Node: training_8/RMSprop/gradients/loss_8/time_distributed_9_loss/mul_grad/BroadcastGradientArgs = BroadcastGradientArgs[T=DT_INT32, _class=["loc:@train.../Reshape_1"], _device="/job:localhost/replica:0/task:0/device:CPU:0"](training_8/RMSprop/gradients/loss_8/time_distributed_9_loss/mul_grad/Shape, training_8/RMSprop/gradients/loss_8/time_distributed_9_loss/mul_grad/Shape_1)]]
I'm not sure if the error might be the format of the training data. This is a sample of train_x:
[array([ 0, 1, 2, 3, 4, 5, 6, 3, 7, 8, 9, 10, 11, 12, 13, 14], dtype=int32), array([15, 16, 17, 18, 19], dtype=int32), array([20, 16, 17, 18, 21, 22, 23, 24, 25, 26], dtype=int32), array([27, 1, 17, 28, 18, 29, 30, 31, 24, 32], dtype=int32), array([33, 1, 17, 3, 34, 35, 36, 37, 18, 38], dtype=int32), array([39, 16, 40, 28, 41, 42], dtype=int32), array([39, 1, 40, 28, 41, 43], dtype=int32), array([44, 6, 3, 45], dtype=int32), array([15, 16, 40, 46, 47, 48, 3, 49, 50, 51, 52, 53], dtype=int32), array([ 0, 54, 28, 55, 56, 57, 58, 59], dtype=int32)]
This is a sample of train_label
[array([0, 1, 2, 1, 2, 3, 0, 1, 4, 2, 2, 4, 5, 6, 0, 7], dtype=int32), array([0, 5, 8, 9, 9], dtype=int32), array([10, 5, 8, 9, 7, 9, 7, 7], dtype=int32), array([10, 1, 8, 1, 9, 9, 3, 0, 7, 2, 7], dtype=int32), array([10, 1, 8, 1, 9, 9, 9, 7, 9, 7], dtype=int32), array([0, 5, 8, 1, 2, 2], dtype=int32), array([0, 1, 8, 1, 2, 2], dtype=int32), array([11, 0, 1, 9], dtype=int32), array([ 0, 5, 8, 7, 12, 13, 1, 2, 9, 9, 5, 6], dtype=int32), array([ 0, 14, 1, 2, 2, 12, 4, 15], dtype=int32)]
This is the code:
n_vocab = len(unique_words) #1517 model = Sequential() model.add(Embedding(n_vocab,100)) model.add(Convolution1D(128, 5, border_mode='same', activation='relu')) model.add(Dropout(0.25)) model.add(GRU(100,return_sequences=True)) model.add(TimeDistributed(Dense(n_classes, activation='softmax'))) model.compile('rmsprop', 'categorical_crossentropy') n_epochs = 30 train_x = [np.asarray(x, dtype=np.int32) for x in encoded_sentences] train_label = [np.asarray(x, dtype=np.int32) for x in encoded_POS] for i in range(n_epochs): print("Training epoch {}".format(i)) bar = progressbar.ProgressBar(maxval=len(train_x)) for n_batch, sent in bar(enumerate(train_x)): label = train_label[n_batch] # Make labels one hot label = np.eye(n_classes)[label][np.newaxis,:] # View each sentence as a batch sent = sent[np.newaxis,:] if sent.shape[1] > 1: #ignore 1 word sentences model.train_on_batch(sent, label)

Is there a way to name a tensorflow variable based on the value of another tensorvariable
I want to be able to do the following
n = str(tf.constant(2)) v = tf.get_variable(name=n, shape=(256,256), initializer=tf.contrib.layers.xavier_initializer())
but doing this is converts n to a string representation of tensorflow variable i.e
"<tf.Tensor 'Const_4:0' shape=() dtype=int32>"

How To Let Tensorflow Use All CPU
My Linux has 2 CPUs (16 cores per CPU). Each core has 2 threads.
CPU usage rate only 50% less if i use
tf.Session(tf.ConfigProto())
CPU usage rate only 40% less if i use
tf.Session(tf.ConfigProto(device_count={"CPU": 64}))
Cpu usage rate is 3070% if i use
tf.ConfigProto(device_count={"CPU": 32}, inter_op_parallelism_threads=1, intra_op_parallelism_threads=64, )
And in this configuration all cores are working, but the usage rate is only 3080%
I have tested in Windows Server (1 cpu / 4 cores). Cpu usage rate is more than 90%.
what should I code can let Tensorflow use all cpu?

Understanding multivariate time series classification with Keras
I am trying to understand how to correctly feed data into my keras model to classify multivariate time series data into three classes using a LSTM neural network.
I looked at different resources already  mainly these three excellent blog posts by Jason Brownlee post1, post2, post3), other SO questions and different papers, but none of the information given there exactly fits my problem case, and I was not able to figure out if my data preprocessing / feeding it into the model is correct, so I guessed I might get some help if I specify my exact conditions here.
What I am trying to do is classify multivariate time series data, which in its original form is structured as follows:
I have 200 samples
One sample is one csv file.
A sample can have 1 to 50 features (i.e. the csv file has 1 to 50 columns).
Each feature has its value "tracked" over a fixed amount of time steps, let's say 100 (i.e. each csv file has exactly 100 rows).
Each csv file has one of three classes ("good", "too small", "too big")
So what my current status looks like is the following:
I have a numpy array "samples" with the following structure:
# array holding all samples [ # sample 1 [ # feature 1 of sample 1 [ 0.1, 0.2, 0.3, 0.2, 0.3, 0.1, 0.2, 0.4, 0.5, 0.1, ... ], # "time series" of feature 1 # feature 2 of sample 1 [ 0.5, 0.6, 0.7, 0.6, 0.4, 0.3, 0.2, 0.1, 0.1, 0.2, ... ], # "time series" of feature 2 ... # up to 50 features ], # sample 2 [ # feature 1 of sample 2 [ 0.1, 0.2, 0.3, 0.2, 0.3, 0.1, 0.2, 0.4, 0.5, 0.1, ... ], # "time series" of feature 1 # feature 2 of sample 2 [ 0.5, 0.6, 0.7, 0.6, 0.4, 0.3, 0.2, 0.1, 0.1, 0.2, ... ], # "time series" of feature 2 ... # up to 50 features ], ... # up to sample no. 200 ]
I also have a numpy array "labels" with the same length as the "samples" array (i.e. 200). The labels are encoded in the following way:
 "good" = 0
 "too small" = 1
 "too big" = 2
[0, 2, 2, 1, 0, 1, 2, 0, 0, 0, 1, 2, ... ] # up to label no. 200
This "labels" array is then encoded with keras'
to_categorical
functionto_categorical(labels, len(np.unique(labels)))
My model definition currently looks like that:
max_nb_features = 50 nb_time_steps = 100 model = Sequential() model.add(LSTM(5, input_shape=(max_nb_features, nb_time_steps))) model.add(Dense(3, activation='softmax')) model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
 The 5 units in the LSTM layer are just randomly picked for now
 3 Output neurons in the dense layer for my three classes
I then split the data into training / testing data:
samples_train, samples_test, labels_train, labels_test = train_test_split(samples, labels, test_size=0.33)
This leaves us with 134 samples for training and 66 samples for testing.
The problem I'm currenty running into, is that the following code is not working:
model.fit(samples_train, labels_train, epochs=1, batch_size=1)
The error is the following:
Traceback (most recent call last): File "lstm_test.py", line 152, in <module> model.fit(samples_train, labels_train, epochs=1, batch_size=1) File "C:\Program Files\Python36\lib\sitepackages\keras\models.py", line 1002, in fit validation_steps=validation_steps) File "C:\Program Files\Python36\lib\sitepackages\keras\engine\training.py", line 1630, in fit batch_size=batch_size) File "C:\Program Files\Python36\lib\sitepackages\keras\engine\training.py", line 1476, in _standardize_user_data exception_prefix='input') File "C:\Program Files\Python36\lib\sitepackages\keras\engine\training.py", line 113, in _standardize_input_data 'with shape ' + str(data_shape)) ValueError: Error when checking input: expected lstm_1_input to have 3 dimensions, but got array with shape (134, 1)
For me, it seems to not work because of the variable amount of features my samples can have. If I use "fake" (generated) data, where all parameters are the same, except each sample has exactly the same amount of features (50), the code works.
Now what I'm trying to understand is:
 Are my general assumptions on how I structured my data for the LSTM input correct? Are the parameters (
batch_size
,input_shape
) correct / sensible?  Is the keras LSTM model in general able to handle samples with different amount of features?
 If yes, how do I have to adapt my code for it to work with different amount of features?
 If no, would "zeropadding" (filling) the columns in the samples with less than 50 features work? Are there other, preferred methods of achieving my goal?

How to deal with data contained in a list of elements with different sizes to train an LSTM?
:)
First of all, I want to apologize about my english that might not be very good since it's not my native language. Afterthat, I expose my problem :
I'm new in Python/Keras and currently, I'm working on a sequencetosequence problem using LSTM. Here it is : I got a dataset of a robot arm kinematics. All kinematics have the same number of columns but different numbers of rows (rows represent time steps). I got 24 kinematic as training data and I want to feed it to an LSTM. So, I've created a list containing the 24 elements. The problem is when I want to create an LSTM, I have to specify the "input_shape" argument (according to an error that occured after running a part of code). But the thing is I don't have any idea about the value I must put on "input_shape" since my training data is a list of kinematics with different amounts of time. All I know is that I have a list with 24 elements with 76 columns each. Here is the part of code that I run :
model = Sequential() model.add(LSTM(100)) model.add(Dense(1, activation='sigmoid')) model.compile(loss='binary_crossentropy', optimizer='adam', metrics= ['accuracy']) print(model.summary())
And here is the error that occured : `
Traceback (most recent call last): File "<input>", line 4, in <module> File "C:\Users\Pctec\Desktop\venv\lib\sitepackages\keras\models.py", line 454, in add raise ValueError('The first layer in a ' ValueError: The first layer in a Sequential model must get an 'input_shape' or 'batch_input_shape' argument.`
Afterthat, I've searched a solution to the problem and found this : https://machinelearningmastery.com/prepareunivariatetimeseriesdatalongshorttermmemorynetworks/
It gave me a little idea about what I should do, however, the author came up with an example where the number of time steps is known, unlike the case I'm working into.
I don't know if I gave enough details about my problem because like I said, I'm still a beginner but maybe someone can help me with this.
So, according to what should I specify the value of input_shape for the LSTM ?
Thank you :)

List of coordinates as a sequence into sequence to sequence model
i have a sequence of a list of coordinates. As Example:
[[[0. 0.1] [0.1 0.2] [0.2 0.3] [0.3 0.4] [0.4 0.5]] [[0. 0.1] [0.1 0.2] [0.2 0.3] [0.3 0.4] [0.4 0.5]] [[0. 0.1] [0.1 0.2] [0.2 0.3] [0.3 0.4] [0.4 0.5]]]
Now i want to train a sequence to sequence model to get a new sequence of a list of coordinates. How is this possible? I tried it with an LSTM, but i always get in trouble with the shapes and dimensions.
Can someone please help me (in Keras would be great)
Thank you

Predicting in Keras with LSTM layer
I'm trying to build a text labeling (multilabel) neural network using Keras.
I have built a dictionary of about 2000 words, and encoded training samples as sequeces of word indices of length 140 (with padding).
As the result data looks like a 2D array of
size (num_samples, 140)
. Where number of samples is around 30k.Here is the definition on my neural network
mdl = Sequential() mdl.add(Embedding((vocab_len + 1), 300, input_length=140)) mdl.add(LSTM(100)) mdl.add(Dense(train_y.shape[1], activation="sigmoid")) mdl.compile(loss='binary_crossentropy', optimizer='rmsprop', metrics=["accuracy"]) history = mdl.fit(train_x, train_y, epochs=4, verbose=1, validation_data=(valid_x, valid_y), batch_size=100)
During training the Keras shows accuracy around 0.93 on both trainig and validation data. Which looks promising.
But when I try to invoke predict on test data
pred_y = mdl.predict(test_x, batch_size=100)
I get an array, where all rows look identical, and all less than 0.5. Hence no labels is set on any of test samples.
Sample output from mdl.predict()
The same behaviour is observed if I run
predict()
on the very same training data I just used to train the model.But if I run
mdl.evaluate()
I get the same accuracy of 0.93 as shown during model fitting.What am I doing wrong?

LSTM giving same prediction for numerical data
I created an LSTM model for intraday stock predictions. I took the training data with the shape of (290, 4). I did all the preprocessing like Normalize the data, taking the difference, taking window size of 4.
This is a sample of my input data.
X = array([[0, 0, 0, 0], [array([ 0.19]), 0, 0, 0], [array([0.35]), array([ 0.19]), 0, 0], ..., [array([ 0.11]), array([0.02]), array([0.13]), array([0.09])], [array([0.02]), array([ 0.11]), array([0.02]), array([0.13])], [array([ 0.07]), array([0.02]), array([ 0.11]), array([0.02])]], dtype=object) y = array([[array([ 0.19])], [array([0.35])], [array([0.025])], ....., [array([0.02])], [array([ 0.07])], [array([0.04])]], dtype=object)
Note: I am giving as well as predicting the difference value. So input value is between range (0.5,0.5)
Here is my Keras LSTM model :
dim_in = 4 dim_out = 1 model.add(LSTM(input_shape=(1, dim_in), return_sequences=True, units=6)) model.add(Dropout(0.2)) model.add(LSTM(batch_input_shape=(1, features.shape[1],features.shape[2]),return_sequences=False,units=6)) model.add(Dropout(0.3)) model.add(Dense(activation='linear', units=dim_out)) model.compile(loss = 'mse', optimizer = 'rmsprop') for i in range(300): #print("Completed :",i+1,"/",300, "Steps") model.fit(X, y, epochs=1, batch_size=1, verbose=2, shuffle=False) model.reset_states()
I am feeding the last sequence value of shape=(1,4) and predict the output. This is my prediction :
base_value = df.iloc[290]['Close'] prediction = [] orig_pred = [] input_data = np.copy(test[0,:]) input_data = input_data.reshape(len(input_data),1) for i in range(100): inp = input_data[i:,:] inp = inp.reshape(1,1,inp.shape[0]) y = model_p.predict(inp) orig_pred.append(y[0][0]) input_data = np.insert(input_data,[i+4],y[0][0], axis=0) base_value = base_value + y prediction_apple.append(base_value[0][0]) sqrt(mean_squared_error(test_output, orig_pred))
RMSE = 0.10592485833344527
Here is the difference in prediction visualization along with stock price prediction.
fig:1 > This is the LSTM prediction
fig:2 > This is the Stock prediction
I am not sure why it is predicting the same output value after 10 iterations. Maybe it is the vanishing gradient problem or I am feeding fewer input data(290 approx) or problem in the model architecture. I am not sure.
Please Help how to get the reasonable result.
Thank you !!!

Structure of the Tensorflow Network for RNN Layers with Different num_units
If I want to have multiple layers of LSTM stacked to each other the code would be something like:
num_units = [a,b] cells =[tf.contrib.rnn.BasicLSTMCell(num_units=n) for n in num_units] stacked_rnn_cell = tf.contrib.rnn.MultiRNNCell(cells) cell_integrated= tf.contrib.rnn.MultiRNNCell(stacked_rnn_cell,state_is_tuple=True) LSTM_output, states = tf.nn.dynamic_rnn(cell_integrated, Data, dtype=tf.float32)
This works. My question though, is what would be the architecture of the network for different integer a and b.
For instance if a > b, where do the second layer units start their interconnection with the first layer units? For a=5, b=3 is the default structure like this?
Two stacked layer of LSTM: First layer units=5, Second layer units=3
Or when b > a, the number of second layer units are bigger than the number of the first layer units. The excessive ones are going after the first layer finishes or they begin sooner? For a=3, b=5 is the default structure like this?
Two stacked layer of LSTM: First layer units=3, Second layer units=5

tf.Dataset.from_tensor_slices performance problem
For my input fn, I am filling the tf.Data.Dataset.from_tensor_slices((np.array1, np.array2, np.array3)
And I am parsing them by calling dataset.map. as a result I am returning my dataset.
I would understand the initialization of the dataset can be slow but when I call the tf.estimator train_and_evaluate function to train and evaluate the model, it is performing very bad. Probably it is populating data in each epoch. What would be the reason? How can I overcome this problem. What would be your recommendations?
Thank you.

Loading a checkpoint from a trained model using estimator
I want to do very simple task. Let us assume that I have executed a model and saved multiple checkpoints and metada for this model using tf.estimator. We can again assume that I have 3 checkpoints. 1, 2 and 3. While I am evaluating the trained results on the tensorboard, I am realizing that checkpoint 2 is providing the better weights for my objective.
Therefore I want to load checkpoint 2 and make my predictions. What I want to ask simply is that, is it possible to delete checkpoint 3 from the model dir and let the estimator load it automatically from checkpoint 2 or is there anything I can do to load a specific checkpoint for.my predictions?
Thank you.

tf.summary.image seems not work for estimator prediction
I want visualize my input image use tf.estimator when predict, but it seems tf.summary.image not save image. But it work for training.
This is my code in model_fn:
... summary_hook = tf.train.SummarySaverHook( save_secs=2, output_dir='summary', scaffold=tf.train.Scaffold(summary_op=tf.summary.merge_all())) #summary_op=tf.summary.merge_all()) tf.summary.histogram("logit",logits) tf.summary.image('feat', feat) if mode == tf.estimator.ModeKeys.PREDICT: return tf.estimator.EstimatorSpec(mode, predictions=preds, prediction_hooks=[summary_hook]) ...
and this my prediction code:
config = tf.estimator.RunConfig(save_summary_steps=0) estimator = tf.estimator.Estimator(model_fn=model_fn, model_dir='logs', config=config) preds = estimator.predict(input_fn=eval_input_fn)
Is there something wrong for using
tf.train.SummarySaverHook
?