Where is training model in deep-q-learning and experience replay section of algorithm
I have a problem in understanding the training section of deep-q-learning in the paper of Deepmind. https://www.nature.com/nature/journal/v518/n7540/pdf/nature14236.pdf
In its algorithm, where is the section of training? How we can train and how we can test this algorithm? Also, which section is experience replay section?
See also questions close to this topic
How to review deep-learning code to find hidden bugs?
We now use frameworks like pytorch, keras to implement a deep learning model. These framework provide convinient components to speed up this process, so it's normal to see loss going down while training.
However, this doesn't mean your model perform as you wish because some mistakes in your code. So I want to know if there is a way to review a deep-learning code and find out hidden bugs.
What is the best framework for running Keras models on Spark?
I have come across many frameworks such as Deeplearning4j, Elephas, SystemML etc. that can be used to run Keras models on Spark. What is the best way to run these models?
Can anyone comment?
How to do max pooling with Eigen::Tensor c++
Can we do max pooling along 3 dimensions with eigen tensor? I tried with the reduction method without success.
Why we need validation dataset? How to use validation dataset to select model in practice?
I am a bit confused about the actual purpose of the validation dataset. As I understand, we use the validation dataset to "pre-evaluate" the model to avoid overfitting. For me, it is more like we use this validation dataset to select the model, e.g. I know we could use AIC to evaluate the model.
So, do we use the validation dataset before the model training? Because in practice, we might just directly say we use the regression model or neural network model to train the data, does that already mean that we just assume that the model we choose is good?
Usually, we have the typical k-fold validation, does that really relate to validating the model? As far as I see, just because we have less data, so we use k-fold validation to test the model. So does k-fold validation really do "validating the model"?
What is the difference between test set and validation set? Similar posts are available. According to what they say, it seems like we train a collection of models based on the same training dataset, then we use the validation data to evaluate those models(or tune the hyperparameters). For example, if we use a neural network, does that mean we use the validation data to tune parameters lilke learning rate or hidden layer sizes?
how to train your data like frozen_inference_graph.pb files format in tensorflow?
i used a pre-trained object detection for my project. but i need to recognize customized objects.
i used "frozen_inference_graph.pb" as The Mask R-CNN model weights. The weights are pre-trained on the COCO dataset.
but i need to train my own model on your own annotated data.
i made my annotated data using pycococreator like this github project
my question is how to train my data and save it in the format like frozen_inference_graph.pb so i can use it instead of my pre-trained dataset ??
just a clue would be enough for me , thanks
what the bars in keras training show?
I am using keras and part of my network and parameters are as follows:
parser.add_argument("--batch_size", default=396, type=int, help="batch size") parser.add_argument("--n_epochs", default=10, type=int, help="number of epoch") parser.add_argument("--epoch_steps", default=10, type=int, help="number of epoch step") parser.add_argument("--val_steps", default=4, type=int, help="number of valdation step") parser.add_argument("--n_labels", default=2, type=int, help="Number of label") parser.add_argument("--input_shape", default=(224, 224, 3), help="Input images shape") parser.add_argument("--kernel", default=3, type=int, help="Kernel size") parser.add_argument("--pool_size", default=(2, 2), help="pooling and unpooling size") parser.add_argument("--output_mode", default="softmax", type=str, help="output activation") parser.add_argument("--loss", default="categorical_crossentropy", type=str, help="loss function") parser.add_argument("--optimizer", default="adadelta", type=str, help="oprimizer") args = parser.parse_args() return args def main(args): # set the necessary list train_list = pd.read_csv(args.train_list, header=None) val_list = pd.read_csv(args.val_list, header=None) train_gen = data_gen_small(trainimg_dir, trainmsk_dir, train_list, args.batch_size, [args.input_shape, args.input_shape], args.n_labels) #print(train_gen, "train_gen is:") val_gen = data_gen_small(valimg_dir, valmsk_dir, val_list, args.batch_size, [args.input_shape, args.input_shape], args.n_labels) model = segnet(args.input_shape, args.n_labels, args.kernel, args.pool_size, args.output_mode) print(model.summary()) model.compile(loss=args.loss, optimizer=args.optimizer, metrics=["accuracy"]) model.fit_generator(train_gen, steps_per_epoch=args.epoch_steps, epochs=args.n_epochs, validation_data=val_gen, validation_steps=args.val_steps, verbose=1)
I get 10 results (the number of epochs) as follows but I do not understand why I have 10 bars for each epoch? Are the accuracy and loss that is reported in each of the bars show the accuracy and loss over each batch? Are they only for one batch or previous batches are also considered in them?
Epoch 10/10 1/10 [==>...........................] - ETA: 3s - loss: 0.4046 - acc: 0.8266 2/10 [=====>........................] - ETA: 3s - loss: 0.3336 - acc: 0.8715 3/10 [========>.....................] - ETA: 2s - loss: 0.3083 - acc: 0.8855 4/10 [===========>..................] - ETA: 2s - loss: 0.2820 - acc: 0.9010 5/10 [==============>...............] - ETA: 1s - loss: 0.2680 - acc: 0.9119 6/10 [=================>............] - ETA: 1s - loss: 0.4112 - acc: 0.8442 7/10 [====================>.........] - ETA: 1s - loss: 0.4040 - acc: 0.8446 8/10 [=======================>......] - ETA: 0s - loss: 0.3811 - acc: 0.8597 9/10 [==========================>...] - ETA: 0s - loss: 0.3623 - acc: 0.8708 10/10 [==============================] - 4s 398ms/step - loss: 0.3495 - acc: 0.8766 - val_loss: 0.5148 - val_acc: 0.7703
PS: the number of my training data is 659 and validation data is 329.
What is the code of shooting bullets to dynamic objects in Python?
I want to train an AI using Reinforcement Learning in python. The goal is that AI should be able to shoot moving balls come to the game env. randomly at different speeds and from different positions. The AI (player) position is fixed and it can only specify the angle of the bullet. The bullet speed is also fixed. Actually, I don't know what are the States and Actions in this continuous and stochastic environment. And please let me know if there is any tutorial available for this type of game environment. Mostly game RL tuts are about the optimal moving of AI from point A to Point B, which I think is not applicable to my problem.
Why Q-Learning is Off-Policy Learning?
Hello Stack Overflow Community!
Currently, I am following the Reinforcement Learning lectures of David Silver and really confused at some point in his "Model-Free Control" slide.
In the slides, Q-Learning is considered as off-policy learning. I could not get the reason behind that. Also he mentions we have both target and behaviour policies. What is the role of behaviour policy in Q-Learning?
When I look at the algorithm, it looks so simple like update your Q(s,a) estimate by using the maximum Q(s',a') function. In the slides, it is said as "we choose the next action using behaviour policy" but here we choose only the maximum one.
I am so confused about the Q-Learning algorithm. Can you help me please?
Link of the slide(pages:36-38): http://www0.cs.ucl.ac.uk/staff/d.silver/web/Teaching_files/control.pdf
What exactly is the difference between Q, V (value function) , and reward in Reinforcement Learning?
In the context of Double Q or Deuling Q Networks, I am not sure if I fully understand the difference. Especially with V. What exactly is V(s)? How can a state have an inherent value?
If we are considering this in the context of trading stocks lets say, then how would we define these three variables?