Keras Reshape: total size of the new array must be unchanged
I'm trying to use Keras Reshape function API to reshape the output of a glove embedding (4D shape: (?, 9, 20, 100)) down to 3D (?, 9, 2000). However, when I tried Reshape((9, 2000))(text_layer), an error pops up and says total size of the new array must be unchanged, even though 9 * 20 * 100 = 9 * 2000. Any ideas why? Code is attached.
text = Input(shape=(9, news_text.shape[1]), name='text')
text_layer = Embedding(
embedding_matrix.shape[0],
embedding_matrix.shape[1],
weights=[embedding_matrix],
input_length=news_text.shape[1]
)(text)
text_layer = Reshape((9, text_layer.shape[2] * text_layer.shape[3]))(text_layer)
1 answer

Remove the
input_length
parameter from theEmbedding
layer.It is strange and I don't know the reason but when you indicate the parameter
input_length
the error is thrown.Anyway, the
Embedding
layer receives the dimension of theInput
layer. It seems that the parameterinput_length
has a very specific use, to know the dimension of the tensor after using aFlatten
layer, etc.In this case, the
Embedding
layer obtains the shape of the output tensor from the input tensor, ignoring theinput_length
parameter.(If you set an invalid value does not throw error until you add the next layer. Note that the
input_lenght
and the resultshape
):>>> inp = Input(shape=(9,20)) >>> emb = Embedding(100,100, input_length=84) (inp) >>> emb <tf.Tensor 'embedding_5/embedding_lookup:0' shape=(?, 9, 20, 100) dtype=float32> >>> res = Reshape((9,2000)) (emb) Traceback (most recent call last): File "<stdin>", line 1, in <module> ...
However, it seems that the
input_length
parameter is in conflict when you add theReshape
layer.Finally:
text = Input(shape=(9, news_text.shape[1]), name='text') text_layer = Embedding( embedding_matrix.shape[0], embedding_matrix.shape[1], weights=[embedding_matrix], )(text) text_layer = Reshape((9, text_layer.shape[2] * text_layer.shape[3]))(text_layer)
See also questions close to this topic

Setting up local Tensorflow training server with multiple user collaboration
I am setting up a local server to build and train ML models with my team. How I used to do it in the past was like this:
 Everyone individually install required dependencies on own machine. (CUDA, cuDNN, Java, Python, etc)
 Everyone contributes to the project, version control via Git.
 When everything is done, SSH into server to pull project.
 Install dependencies required for project to run on server.
I find it when I do it this way, most of the time is spent on resolving dependency issues, and is highly inefficient. (especially installing Tensorflow with GPU support on Ubuntu!)
Recently I am exploring other options such as using Docker (since Tensorflow also recommends it) with Jupyter so I can easily manage things with Docker images. However I am unsure if it is the best way to do it, especially due to the case I am extremely unfamiliar with Docker and Jupyter, and if it can support multiuser collaboration. I am only familiar with building stuffs with IDEs, and using it with Git.
Would appreciate it if I can receive some suggestions on how should I go about this. Please do ask more if you need more specifications. Thank you.
EDIT:
Constrains:
 Cloud solutions such as AWS, Google Cloud ML is not an option due to data privacy issues.

Having trouble with custom loss function for YOLO using keras or tensorflow
I am trying to to define custom loss function for YOLO to detect presence of a single class object and locate its centre( kinda landmark detection) as learned from Andrew NG. Each grid of 49 grids outputs a vector(7*7*3) having depth of only 3 units. First channel indicates probability of object in that grid, and other two predicts coordinates of centre of my object for landmark detection. I have lately been trying to get my head around tensor calculus to avoid a faulty loss function, but having trouble to improve accuracy.
I am just subtracting all three channels of y_true from y_pred but multiplying the result from 2nd and 3rd channel with 1st channel matrix of y_true as we dont want to account coordinates predicted in 2nd and 3rd channel if 1st channel is not predicting the presence of object itself.
def yol_loss(y_true, y_pred):
shape = tf.shape( y_true[:, :, :, 0] ) a=tf.ones([shape[0], shape[1], shape[2] ], tf.float32)
loss = K.mean((K.square(tf.multiply(y_true[:, :, :, 0], a)  tf.multiply(y_pred[:, :, :, 0],a ))),axis=(1,2)) +
64*K.mean((K.square(tf.multiply(y_true[:, :, :, 0], y_true[:, :, :, 1])  tf.multiply(y_true[:, :, :, 0],y_pred[:, :, :, 1]))),axis=(1,2)) +
64*K.mean((K.square(tf.multiply(y_true[:, :, :, 0], y_true[:, :, :, 2])  tf.multiply(y_true[:, :, :, 0],y_pred[:, :, :, 2]))),axis=(1,2))
return loss

Tensorflow add new op slicing the output tensor
I checked the file here:
// get the corresponding Eigen tensors for data access auto input_tensor = input.matrix<double>(); auto weights_tensor = weights.matrix<double>(); auto biases_tensor = biases.matrix<double>(); auto output_tensor = output>matrix<double>(); for (int ix_sample = 0; ix_sample < batch_samples; ix_sample++) { for (int ix_unit = 0; ix_unit < units; ix_unit++) { output_tensor(ix_sample, ix_unit) = 0; for (int ix_input = 0; ix_input < input_feature_width; ix_input++) { output_tensor(ix_sample, ix_unit) += input_tensor(ix_sample, ix_input) * weights_tensor(ix_input, ix_unit ); } output_tensor(ix_sample, ix_unit) += biases_tensor(0, ix_unit); } }
And the one on Tensorflow tutorial:
// Create an output tensor Tensor* output_tensor = NULL; OP_REQUIRES_OK(context, context>allocate_output(0, input_tensor.shape(), &output_tensor)); auto output_flat = output_tensor>flat<int32>(); // Set all but the first element of the output tensor to 0. const int N = input.size(); for (int i = 1; i < N; i++) { output_flat(i) = 0; }
I am wondering if the output tensor is a 3d tensor, how could I slice it and assign its value vectorwise.
I find the
slice
method forEigen
:Eigen::Tensor<int, 2> a(4, 3); a.setValues({{0, 100, 200}, {300, 400, 500}, {600, 700, 800}, {900, 1000, 1100}}); Eigen::array<int, 2> offsets = {1, 0}; Eigen::array<int, 2> extents = {2, 2}; Eigen::Tensor<int, 1> slice = a.slice(offsets, extents); cout << "a" << endl << a << endl; => a 0 100 200 300 400 500 600 700 800 900 1000 1100 cout << "slice" << endl << slice << endl; => slice 300 400 600 700
But not clear how could I use it here.

Scikit learn KerasClassifer evaluation error
I have created a Keras Classifier for k fold validation. Below is the function for building the classifer.
def build_classifier(): classifier = Sequential() classifier.add(Dense(activation="relu", input_dim=11, units=6, kernel_initializer="uniform")) classifier.add(Dense(activation="relu", units=6, kernel_initializer="uniform")) classifier.add(Dense(activation="sigmoid", units=1, kernel_initializer="uniform")) classifier.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics = ['accuracy']) return classifier
I have initialized the classifier as below.
classfier = KerasClassifier(build_fn = build_classifier, batch_size=10, epochs=100)
I am trying to get the accuracy score from this 10 fold validation.
accuracies = cross_val_score(estimator = classifier, X = X_train, y = y_train, cv = 10, n_jobs = 1)
However it show me a TypeError:
TypeError: If no scoring is specified, the estimator passed should have a 'score' method. The estimator does not.

Don't Understand how to Implement Embeddings for Categorical Features
From various examples I've found online I still don't quite understand how to create embedding layers from my categorical data for neural network models, especially when I have a mix of numerical and categorical data. For example, taking the data set as below.:
numerical_df = pd.DataFrame(np.random.randint(0,100,size=(100, 3)), columns=['num_1','num_2','num_3']) cat_df = pd.DataFrame(np.random.randint(0,5,size=(100, 3)), columns=['cat_1','cat_2','cat_3']) df = numerical_df.join(cat_df)
I want to create embedding layers for my categorical data and use that in conjunction with my numerical data but from all the examples I've seen its almost like the model just filters the entire dataset through the embedding layer, which is confusing.
As an example of my confusion, below is an example from Keras' documentation on sequential models. It's as though they just add the embedding step as the first layer and fit it to the entirety of x_train.
from keras.models import Sequential from keras.layers import Dense, Dropout from keras.layers import Embedding from keras.layers import LSTM max_features = 1024 model = Sequential() model.add(Embedding(max_features, output_dim=256)) model.add(LSTM(128)) model.add(Dropout(0.5)) model.add(Dense(1, activation='sigmoid')) model.compile(loss='binary_crossentropy', optimizer='rmsprop', metrics=['accuracy']) model.fit(x_train, y_train, batch_size=16, epochs=10) score = model.evaluate(x_test, y_test, batch_size=16)
So ultimately when it comes to creating embedding matrices, is there one per categorical variable...one for all categorical variables? And how do I reconcile this with my other data that doesn't need an embedding matrix?

how to structure test batches in dynamic_rnn
Having a conceptual mind block here. Creating an LSTM using
tf.nn.dynamic_rnn
.Example: next word prediction, using input
["The", "quick", "brown", "fox", "jumps", "over", "the", "lazy", "dog"]
, trained usingnum_unrollings = 4
If I test, sending one unrolling (4 words long),
["The", "quick", "brown", "fox"]
, I would like the output to be["jumps"]
. But, the output will also be 4 words long, since the input and output vector shapes must match. So, ideally, the output would be["quick", "brown", "fox", "jumps"]
. Yes, that is too much to ask, because there is no way the algorithm could return"quick"
after just seeing the word"The"
. And, in this situation, we cannot go back in time and update the first output word to "quick" after seeing all 4 input words. We can only predict forward in time.Question 1: Input
X[0,1,2,3]
returnsy[0,1,2,3]
. How arey[0,1,2]
predicted? Isy[0]
predicted after the model only seeingX[0]
? Should I expect the first prediction vector of["man","delivery","fox","jumps"]
?Question 2: How do I batch the unrollings in the test? If I batch as:
[["The", "quick", "brown", "fox"], ["jumps", "over", "the", "lazy"]]
should I expect the output to be:
["man","delivery","fox","jumps"],["high","the","fence","dog"]
?If I batch as:
[["The", "quick", "brown", "fox"], ["quick", "brown", "fox", "jumps"],["brown", "fox", "jumps", "over"]]
should I expect the output to be:
["man","delivery","fox","jumps"],["delivery","fox","jumps","over"],["bag", "jumps", "over", "the"]
?How can I structure the inputs to get the (flattened) output of:
["quick", "brown", "fox", "jumps", "over", "the", "lazy", "dog"]
?Should I concatenate the outputs
y[3]
? 
How to get/evaluate weights from deep learning model
I would like to know is there a way to get/evaluate attributes' weights after training the deep learning model so that I am able to know which attribute may be a more important role to the question. I would be grateful for any help!

How to penalise my agent for using different action?
I'm using A3C in tensroflow, I want to keep my agent using the same action and only change when necessary and will have a benefit from changing.

what functions of OPENCV mainly used in image analysis or particularly in deep learning?
I want to know about what functions or how we use OPENCV in analysis the image which is overall beneficial for deep learning models?

LSTM Training Input Versus Live Evaluation Input  Dynamic RNN?
I am having trouble wrapping my head around RNNs for this problem.
The problem: live binary classification of video using image sequences. Meaning I am receiving a video one image at a time and need to predict either Class A or Class B for the most recent image received.
Current Solution
Training  I use a CNN as feature extractor on a full sequence of images. I then feed multiple images (lstmlen, cnnfeaturesize) into the LSTM.
Live Evaluation  I receive 1 frame at a time and run it through the CNN. I add these new features to a queue of length lstmlen, then I take all the features from the queue and feed into the LSTM.
What I don't understand
Why is it that I have to keep track of and feed all of the features into the LSTM at evaluation time? The point of an LSTM is to remember past inputs so it seems redundant for me to input all the previous images at every time step. What I would like to be able to do is simply calculate the features for the most recent images and then feed those new features into the LSTM while the LSTM remembers the last lstmlen number of frames.
Am I using the RNN incorrectly in this case? Should I be able to simply use the previous LSTM state as input into the other LSTM cells and provide feature input for only the newest image?
I'm thinking something like tensorflows DynamicRNN may be the solution to this problem.
Pretty confused about this. Thanks for the help!

Reinforcement learning  How to deal with varying number of actions which do number approximation
I am a new to Reinforcement learning, but I am trying to use RL in this task:
Given a function definition in written e.g. in C with 1 to 10s of input arguments (only numerical ones  integer, float, etc.) and the body of the function (represented as a Abstract Syntax Tree/ Abstract Decision Tree with data dependencies  how the internal variable values change) I would like to approximate the values of these input parameters so for e.g. a certain decision block is executed. For this I thought of a recurrent network with LSTM cells.
Now, to achieve this, I would traverse one path in the tree leading to the block and take note of any data changes and decision blocks in the path. These steps would influence my parameter input predictions  what values to insert into/change in the input parameters if I wish to have a certain decision block executed.
Action: Changing the value of one chosen input parameter of the function OR Changing the value of all input parameters individually (with mathematical different operation). After action execution, moving onto the next node in the tree.
Reward: How close I am to executing the given decision block (thus satisfying the condition) with given input parameter values.
Goal: Have a condition in the code satisfied and a decision block thus executed (e.g. an if condition is met).
State: Current position in the AST/ADT with data dependencies.
Assuming that I already have a way to evaluate, how far I am from executing the wanted decision block given current parameter input values, I came across two problems:
How would I deal with varying number of function input parameters in RL? If I want to change their values to be closer to the execution of the wanted decision block, the number of given actions changes with the number of parameters defined for the given function.
If I already did chose one parameter, what is the best way to do number approximation using RL? In the function body there could be numerous very complex mathematical operations happening, so should there be defined action as logarithm, exponentiation, division, multiplying, etc. or is there a better way with maybe just adding/subtracting from the current value?
If you find any mistakes in my definition of the Actions, Reward, Goal or State, please do correct me, as I am still a big learner in this field.
Thank you for your answers.

LSTM Neural Network weighted connections
I'm currently implementing my own LSTM neural network using the neat algorithm. Everything about LSTM seems pretty clear so far besides the weights: Are the weights unique per lstmcell or do they depend on their output connection?
For example, if one cell has outgoing connections does every connection hold different weights for all input/reccurrent datas (Just like in a simple non RNN network where every connection has a weight assigned) of that cell or are they unique for all cells (same output over every outgoing connection)?

Reshaping i.c.m. with merging in data.table
I have a dataset df of which
prop
is the amount of observations in that year as a fraction of total observations. For example: For the Netherlands (NLD) 60% of observations have the year 2005. For Bulgaria (BLG) this is 50%.row country year prop 1: 1 NLD 2005 0.6 2: 2 NLD 2005 0.6 3: 3 BLG 2006 0.5 4: 4 BLG 2005 0.5 5: 5 GER 2005 1.0 6: 6 NLD 2007 0.2 7: 7 NLD 2005 0.6 8: 8 NLD 2008 0.2
I would like to connect these values to a different dataset (
df2
which has questions related to those years) and looks as follows:row country q05 q06 q07 q08 1: 1 NLD 1 2 1 3 2: 2 NLD 2 1 2 3 3: 3 NLD 1 2 2 4 4: 4 BLG 5 5 2 4 5: 5 BLG 1 2 1 1 6: 6 BLG 2 2 5 1 7: 7 GER 3 5 4 4 8: 8 GER 2 5 3 4 9: 9 GER 1 2 3 5
What I want is to get the following:
row country q05 q06 q07 q08 prop2005 prop2006 prop2007 prop 2008 1: 1 NLD 1 2 1 3 0.6 0.0 0.2 0.2 2: 2 NLD 2 1 2 3 0.6 0.0 0.2 0.2 3: 3 NLD 1 2 2 4 0.6 0.0 0.2 0.2 4: 4 BLG 5 5 2 4 0.5 0.5 0.0 0.0 5: 5 BLG 1 2 1 1 0.5 0.5 0.0 0.0 6: 6 BLG 2 2 5 1 0.5 0.5 0.0 0.0 7: 7 GER 3 5 4 4 1.0 0.0 0.0 0.0 8: 8 GER 2 5 3 4 1.0 0.0 0.0 0.0 9: 9 GER 1 2 3 5 1.0 0.0 0.0 0.0
In other words, for every observation, I want the proportions connected to that country added to the observation (as they function like a weight).
I am reasonably familiar with merging in data.table;
df1 < merge(df1, df2, by= "country", all.x = TRUE, allow.cartesian=FALSE)
However, I don't really know how I can reshape the data.table to correctly merge it.
Any suggestions?

R: Wide to long format using melt and variable suffix
I have the following wide dataset named
wide
with 6 factors and one participant ID column:participant varA_0 varA_1 varB_0 varB_1 varC_0 varC_1
I'd like to convert this to a long format in the following:
participant time varA varB varC
Where
time
corresponds to timepoint0
or1
, given by the suffix ofvar*_
I have tried:
melt(wide, id.vars = "participant")
But this gives me:
participant variable value 1 3 varA_0 5 2 4 varA_0 5 3 5 varA_0 6 4 6 varA_0 3 5 7 varA_0 7 6 3 varA_1 1 7 4 varA_1 2 8 5 varA_1 1 9 6 varA_1 3
And so on. Any assistance on how I may get my desired formatting of data would be greatly appreciated :)

Is there any way I can get an ordered list from the TukeyHSD output?
For example, my TukeyHSD output is like this:
a < as.factor(c(rep("a11", 10), rep("b311", 10), rep("c4", 10), rep("d55",10))) b < c(15.229088, 17.805860, 9.684034, 13.530569, 22.682383, 14.571289, 6.339910, 16.197563, 14.491099, 15.390546, 27.202050, 17.394087, 15.779863, 7.232164, 18.777153, 18.745640, 17.300433, 24.261130, 19.704504, 24.409819, 26.673809, 40.137541, 38.380306, 24.273545, 28.085976, 19.494637, 23.078352, 24.036720, 30.083859, 14.040216 ,16.326138, 23.076904, 38.501952, 28.749595, 21.263193, 29.268084, 25.910249, 16.180871, 23.306857, 22.095577) anova(aov(b~a)) TukeyHSD(aov(b~a)) diff lwr upr p adj b311a11 4.488450 3.0623393 12.039240 0.3909031 c4a11 12.236262 4.6854725 19.787051 0.0005723 d55a11 9.875708 2.3249184 17.426497 0.0062161 c4b311 7.747812 0.1970223 15.298601 0.0424965 d55b311 5.387258 2.1635318 12.938047 0.2372192 d55c4 2.360554 9.9113436 5.190235 0.8340926
Is there any way I can get the final list based on
diff
withp adj
< 0.05?Final list (big to small): c4, d55, b311, a11