Should I use activation function and normalization for regression?
I have a regression problem, a model in related paper use a min-max normalization to normalize input data and output data to -1 to 1 range, and it apply a tanh activation function in the last output layer. However, I found out it's very hard to train, the loss and rmse decrease slowly. If I remove the activation function in output layer, and do not use any data normalization, it get the best score. So, I have two questions:
Do I have to use some activation function in last layer and some data normalization for a regression problem? (all features and true value are in the same scale, just like the house price in different area etc..)
Even I remove the activation function in last layer, but I found out that if I don't use any data normalization, the loss decrease more faster. If I normalize the data to -1 to 1 range or 0 to 1 range (use min-max normalization), the result is always worse. But, why?
See also questions close to this topic
Symbol lookup error after successfully built graph tensorflow tool
I am trying to build summerize_graph which is a tensorflow tool used to visualize the graph of model. I build this tool using bazel - referred link
$ bazel build tensorflow/tools/graph_transforms:summarize_graph
when I am trying to use it with this command
$ ./summarize_graph --in_graph=20170512-110547.pb
is giving this error
./summarize_graph: symbol lookup error: /home/sneha/.cache/bazel/_bazel_sneha/5493eb67fb8c3272a4d7ed9f724aa3c2/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/tools/graph_transforms/../../../_solib_k8/_U_S_Stensorflow_Stools_Sgraph_Utransforms_Csummarize_Ugraph___Utensorflow/libtensorflow_framework.so.2: undefined symbol: _ZN10tensorflow19IsGoogleCudaEnabledEv
So I am not getting whether I should install cuda or what ? Please help !! Thank you so much.
How to create keras sparse tensor placeholder with dtype int32?
I am trying to create a keras sparse tensor placeholder with dtype int32 using the following code.
self.labels = Input(name='the_labels', shape=[self.absolute_max_string_len], sparse=True, dtype='int32')
However I got this error instead.
TypeError: Failed to convert object of type <class 'tensorflow.python.framework.sparse_tensor.SparseTensor'> to Tensor. Contents: SparseTensor(indices=Tensor("the_labels/indices:0", shape=(?, 2), dtype=int64), values=Tensor("the_labels/values:0", shape=(?,), dtype=int32), dense_shape=Tensor("the_labels/shape:0", shape=(2,), dtype=int64)). Consider casting elements to a supported type.
EDIT: Full paste of the trace https://pastebin.com/2AQLgtYx
Understanding dropout in a deep LSTM network
I am using keras to train a 2 layer (stateless) LSTM network and I am trying to understand what parts of the network are actually set to 0 when using dropout.
The architecture is as below
model = Sequential() # first LSTM layer model.add(LSTM(64, input_shape=(X.shape, X.shape), return_sequences=True, dropout=0.4, recurrent_dropout=0.4)) model.add(Dropout(0.5)) # second LSTM layer model.add(LSTM(64, dropout=0.4, recurrent_dropout=0.4)) model.add(Dropout(0.5)) model.add(Dense(1))
According to the LSTM documentation
- dropout: Float between 0 and 1. Fraction of the units to drop for the linear transformation of the inputs.
- recurrent_dropout: Float between 0 and 1. Fraction of the units to drop for the linear transformation of the recurrent state.
I have a few issues that need clarification:
- Is it redundant to use both the
recurrent_dropoutparameters along with the
Dropout()method together as above?
- What part of LSTM is the "linear transformation of the inputs"? Is it one of the learned weight matrices?
- What part of LSTM is the "recurrent state"? Is it the cell state C or one of the forget / update gates?
Pytorch - going back and forth between eval() and train() modes
I'm studying "Deep Reinforcement Learning" and build my own example after pytorch's REINFORCEMENT LEARNING (DQN) TUTORIAL.
I'm implement actor's strategy as follows: 1. model.eval() 2. get best action from a model 3. self.net.train()
The question is: Does going back and forth between eval() and train() modes cause any damage to optimization process?
The model includes only Linear and BatchNorm1d layers. As far as I know when using BatchNorm1d one must perform model.eval() to use a model, because there is different results in eval() and train() modes.
When training Classification Neural Network the model.eval() performed only after training is finished, but in case of "Deep Reinforcement Learning" it is usual to use strategy and then continue the optimization process.
I'm wondering if going back and forth between modes is "harmless" to optimization process?
def strategy(self, state): # Explore or Exploit if self.epsilon > random(): action = choice(self.actions) else: self.net.eval() action = self.net(state.unsqueeze(0)).max(1).detach() self.net.train()
TDNN - How can I use this implementation?
I am trying to classify the following data structure with an TDNN as this was suggested in my last question.
[[ [deltaX,deltaY,deltaTime], [deltaX,deltaY,deltaTime],... ],class]
I found some implementations of TDNNs on github, though there is no documentation for them. For my classification problem I would use this implementation on github, as it states it is "faster".
Now I'm struggeling with adding the TSDNN layer to my current code.
I am loading my data with the following code.
classZero = genfromtxt('/CSV/data/0.csv', delimiter=',') classOne = genfromtxt('/CSV/data/1.csv', delimiter=',') #TODO: load all CSV files not only two examples labelZero =  labelOne =  training_data =  for i in range(len(classZero)): training_data.append([classZero[i], labelZero]) for i in range(len(classOne)): training_data.append([classOne[i], labelOne])
So my questions are:
Q1: How do I input my CSV files into the TSDNN implementation mentioned above?
Q2: Where should I put my label?
For now, my model is defined by the code:
model = torch.nn.Sequential( torch.nn.Linear(D_in, H), torch.nn.ReLU(), tdnn_layer, #not sure how to add it here or if this is right torch.nn.Linear(H, D_out), ).to(device)
Q3: How do I add the TDNN layer to my model?
Q4: Is there any other Neural Network architecture that could possibly solve this problem and is implemented in high-level APIs such as keras?
Thank you for any help in advance! I'm trying to get used to PyTorch and Deeplearning with Python in general as I used to work with Deeplearning4j in Java.
How to handle "None of the above" in multiclass image classification?
I found some similar questions (such as How best to deal with "None of the above" in Image Classification?) but nothing very recent.
In my problem I have 10 trained classes of interest, but occasionally I get a garbage image of some kind, and of course the classifier just guesses from the options it has. Sometimes it spreads the softmax probability out, but often it is fairly confident in one of its choices. This is not unreasonable to me since it lives in a world of "given an image of one of 10 things, which is it?", but it is problematic for a system I intend to integrate into deliverable software.
A "none of the above" class would be okay, but finding training examples for such a thing is a little dicey.
What approaches are effective for this problem?
Google Colaboratory keeps disconnecting a couple of seconds after connection
I'm going to use Google Colab to train my DeepFace model. And I came across several problems.
- Sometimes it works almost 3 hours.
- Mostly it disconnects after 10 seconds after I connected.
The error that appears is "runtime disconnected" in the right-bottom corner.
I read that Google gives you 12 hours and you have to babysit the notebook. But it just reconnects after several seconds and deletes all files. Does anybody know what to do with it?
Keras: Change MSE loss function in tensorflow 2.0
Does somebody know how to write custom loss functions in tensorflow 2.0? I have implemented a keras model and looking at the results that I have obtained using the Mean Squared Error, I want to try using a custom loss function.
I have tried to do it in this way
def get_model(n_sensors,rnn_layer_type, n_rnn_units,window_length): input_layer = Input((window_length, n_sensors), name='input_sensor') input1 = Input((window_length,n_sensors),name = 'Input1') input2 = Input((1,),name = 'Input2') input3 = Input((n_sensors,4),name = 'Input3') input3_flatten = Flatten()(input3) input_layer_list = [input1,input2,input3] return_sequences = True x = get_recurrent_layer(rnn_layer_type,100, return_sequences,input1) return_sequences = False x = BatchNormalization()(x) x = Dropout(0.4)(x) x = get_recurrent_layer(rnn_layer_type,50, return_sequences,x) x = BatchNormalization()(x) x = Dropout(0.4)(x) x = Dense(1,activation='relu')(x) x1 = Dense(1,activation='relu')(input2) out = Lambda(lambda a: -a + a)([x, x1]) x2 = Dense(1)(input3_flatten) merged = [out,x2] merged = concatenate(merged) out = Dense(1,activation='linear')(merged) model = Model(inputs=input_layer_list, outputs=out) adam = Adam(lr=10e-4) rmsprop = RMSprop(learning_rate=10e-4, rho=0.9) model.compile(loss=customLoss, optimizer=adam, metrics = [tf.keras.metrics.RootMeanSquaredError(name='rmse')]) model.summary() plot_model(model,'model1.png') return model @tf.function() def customLoss(yTrue,yPred): return tf.reduce_mean(yPred-yTrue)
but it doesn´t work.