Where is training model in deepqlearning and experience replay section of algorithm
I have a problem in understanding the training section of deepqlearning in the paper of Deepmind. https://www.nature.com/nature/journal/v518/n7540/pdf/nature14236.pdf
In its algorithm, where is the section of training? How we can train and how we can test this algorithm? Also, which section is experience replay section?
See also questions close to this topic

Is there any easy way to time pytorch code (python) while running in GPU ( not in CPU)?
Are there any easy ways to time pytorch code (python) while running in GPU ( not in CPU)? I can simply use the
time
module builtin in python to time code in CPU. However are there any easy way to use functions to time code for GPU? Thanks. 
How to handle None in tf.clip_by_global_norm?
I have read in answers to this question here that tf.clip_by_global_norm() handles None values by simply ignoring them (comment by danijar in comments to the answer by @danijar) but when i try to apply it i seem to be doing something wrong as it throws
ValueError: None values not supported.
tf.reset_default_graph() z = tf.get_variable(name = 'z', shape = [1]) b = tf.get_variable('b', [1]) c = b*b  2*b + 1 optimizer = tf.train.AdamOptimizer(0.1) gradients, variables = zip(*optimizer.compute_gradients(c)) gradients = tf.clip_by_global_norm(gradients, 2.5) train_op = optimizer.apply_gradients(zip(gradients, variables))
Can somebody please tell me what am i doing wrong or if tf.clip_by_global_norm() does not handle None gradients and i have to take care of them manually
The official documentation seems to agree with @danijar's comments. see here
Any of the entries of t_list that are of type None are ignored.

How can I build an RNN without using nn.RNN
I need to build an RNN (without using nn.RNN) with following specifications :
It should have set of weights [
It is a chanracter RNN.
It should have 1 hidden layer
Wxh (from input layer to hidden layer )
Whh (from the recurrent connection in the hidden layer)
W ho (from hidden layer to output layer)
I need to use
Tanh
for hidden layerI need to use softmax for output layer.
I have implemented the code . I am using
CrossEntropyLoss()
as loss function . Which is giving me error asRuntimeError Traceback (most recent call last) <ipythoninput3394b42540bc4f> in <module>() 25 print("target ",target_tensor[timestep]) 26 > 27 loss += criterion(output,target_tensor[timestep].view(1,n_vocab)) 28 29 loss.backward() /opt/anaconda/lib/python3.6/sitepackages/torch/nn/modules/module.py in __call__(self, *input, **kwargs) 323 for hook in self._forward_pre_hooks.values(): 324 hook(self, input) > 325 result = self.forward(*input, **kwargs) 326 for hook in self._forward_hooks.values(): 327 hook_result = hook(self, input, result) /opt/anaconda/lib/python3.6/sitepackages/torch/nn/modules/loss.py in forward(self, input, target) 145 _assert_no_grad(target) 146 return F.nll_loss(input, target, self.weight, self.size_average, > 147 self.ignore_index, self.reduce) 148 149 /opt/anaconda/lib/python3.6/sitepackages/torch/nn/functional.py in nll_loss(input, target, weight, size_average, ignore_index, reduce) 1047 weight = Variable(weight) 1048 if dim == 2: > 1049 return torch._C._nn.nll_loss(input, target, weight, size_average, ignore_index, reduce) 1050 elif dim == 4: 1051 return torch._C._nn.nll_loss2d(input, target, weight, size_average, ignore_index, reduce) RuntimeError: multitarget not supported at /opt/conda/condabld/pytorch_1513368888240/work/torch/lib/THNN/generic/ClassNLLCriterion.c:22
Here is my code for model :
class CharRNN(torch.nn.Module): def __init__(self,input_size,hidden_size,output_size, n_layers = 1): super(CharRNN, self).__init__() self.input_size = input_size self.hidden_size = hidden_size self.n_layers = 1 self.x2h_i = torch.nn.Linear(input_size + hidden_size, hidden_size) self.x2h_f = torch.nn.Linear(input_size + hidden_size, hidden_size) self.x2h_o = torch.nn.Linear(input_size + hidden_size, hidden_size) self.x2h_q = torch.nn.Linear(input_size + hidden_size, hidden_size) self.h2o = torch.nn.Linear(hidden_size, output_size) self.sigmoid = torch.nn.Sigmoid() self.softmax = torch.nn.Softmax() self.tanh = torch.nn.Tanh() def forward(self, input, h_t, c_t): combined_input = torch.cat((input,h_t),1) i_t = self.sigmoid(self.x2h_i(combined_input)) f_t = self.sigmoid(self.x2h_f(combined_input)) o_t = self.sigmoid(self.x2h_o(combined_input)) q_t = self.tanh(self.x2h_q(combined_input)) c_t_next = f_t*c_t + i_t*q_t h_t_next = o_t*self.tanh(c_t_next) output = self.softmax(h_t_next) return output, h_t, c_t def initHidden(self): return torch.autograd.Variable(torch.zeros(1, self.hidden_size)) def weights_init(self,model): classname = model.__class__.__name__ if classname.find('Linear') != 1: model.weight.data.normal_(0.0, 0.02) model.bias.data.fill_(0)
`
and this is the code for training the model :
` input_tensor = torch.autograd.Variable(torch.zeros(seq_length,n_vocab)) target_tensor = torch.autograd.Variable(torch.zeros(seq_length,n_vocab)) model = CharRNN(input_size = n_vocab, hidden_size = hidden_size, output_size = output_size) model.apply(model.weights_init) criterion = nn.CrossEntropyLoss() optimizer = torch.optim.Adam(model.parameters(), lr = learning_rate) for i in range(n_epochs): print("Iteration", i) start_idx = np.random.randint(0, n_charsseq_length1) train_data = raw_text[start_idx:start_idx + seq_length + 1] input_tensor = torch.autograd.Variable(seq2tensor(train_data[:1],n_vocab), requires_grad = True) target_tensor= torch.autograd.Variable(seq2tensor(train_data[1:],n_vocab), requires_grad = False).long() loss = 0 h_t = torch.autograd.Variable(torch.zeros(1,hidden_size)) c_t = torch.autograd.Variable(torch.zeros(1,hidden_size)) for timestep in range(seq_length): output, h_t, c_t = model(input_tensor[timestep].view(1,n_vocab), h_t, c_t) loss += criterion(output,target_tensor[timestep].view(1,n_vocab)) loss.backward() optimizer.step() optimizer.zero_grad() x_t = input_tensor[0].view(1,n_vocab) h_t = torch.autograd.Variable(torch.zeros(1,hidden_size)) c_t = torch.autograd.Variable(torch.zeros(1,hidden_size)) gen_seq = [] for timestep in range(100): output, h_t, c_t = model(x_t, h_t, c_t) ix = np.random.choice(range(n_vocab), p=output.data.numpy().ravel()) x_t = torch.autograd.Variable(torch.zeros(1,n_vocab)) x_t[0,ix] = 1 gen_seq.append(idx2char[ix]) txt = ''.join(gen_seq) print ('') print (txt) print ('')
Can you please help me ?
Thanks in advance.

tensorflow understanding "ssd_mobilenet_v1_pets.config"
the following config file is used by ssd model trainer to train custom objects, I would like to get a detailed understanding of each parameters being set as during training my PC lags and kills trainig it is an Corei7 6th gen with 8GB ram.
So I would like to get a better understanding thus allowing me to tweak the parameters and complete training.
This is my first stackoverflow post so please mention any shortcoming in my question

how to insert two or more label lists in the tf.estimator.inputs.numpy_input_fn?
I am using the
tf.estimator.inputs.numpy_input_fn
to feed data in my model and train it in a similar way with the MNIST example. The only difference is that I need to insert two numpy lists of labels instead of one. I tried passing them in a dictionary like this:train_input_fn = tf.estimator.inputs.numpy_input_fn( x={"x": training_images}, y={"labels1": training_labels1, "labels2": training_labels2}, batch_size=BATCH_SIZE, num_epochs=None, shuffle=True) my_cnn_model.train(input_fn=train_input_fn,steps=NUM_TRAINING_STEPS)
Then when I try to retrieve them in the model like so:
def build_cnn_model(features, labels, mode):
I get the following error:
AttributeError: 'dict' object has no attribute 'shape'
I also tried to change the name of the variable "labels" to be "targets" according to tensorflow inputs.numpy_input_fn documentation:
def build_cnn_model(features, targets, mode):
and I get this error:
ValueError: model_fn (<function build_cnn_model at 0x7f88df9c9d08>) has following not expected args: ['targets']
If you have any solution or suggestion to my problem, please let me know.
Thanks a lot in advance.
Antonios

Quality of training data as a feature
I build website classifier which uses text data. There is probability that some pages of website do not belong to main category of website (for example, personal author's blog on some IT website).
So I add some additional metric which says, for example, that data ofurl1
is more likely to belong to the given website class than data ofurl2
if it has value 0.95 in comparison to 0.8 forurl2
because I physically can't mark each webpage with "true" class tag.
How can I use this metric for training? And how should I combine that data with text data in Pipeline? Appreciate any help 
Q Learning w/ Galaga  Defining States
I am working on an implementation of
QLearning
to build an ai to play Galaga. I understand thatQlearning
requires states and actions, and tables to determine movement between states.All the examples and tutorials for
QLearning
online seem to be for a gridbased game, with easily defined states. But Galaga involves moving left, right, and shooting upwards, with enemies moving randomly throughout gameplay. So, I'm having trouble defining what my states in theQLearning
algorithm should be. I've considered having every potential position of the ship to be a state, or perhaps having states dependent on the number of enemies remaining alive. I've even considered having states for every frame of gameplay, but that seems obviously too costly.I'd appreciate it if anyone with a better understanding of
qlearning
could help me just define what my states should be. I also understand the necessity for rewards, but I'm not entirely sure what the reward would be on a framebyframe basis, since the game score only increases when enemies are killed. Perhaps some function of the gamescore and the framecount.Thanks for any help!

Multiagent Qlearning with Experience Replay
While reading a paper on arxiv, I learned that experience replay is not very viable with independent Qlearning (IQL). Since experience replay draws on past experiences, in which other agents in IQL would have had other policies, these experiences are obsolete. So far, so good.
However, the paper seems to imply that experience replay works well in multiagent scenarios where all agents use one model (parameter sharing). I cannot really understand why; perhaps someone here could explain it to me?
To my knowledge, the experience replay buffer just saves
(s,s',a,r)
tuples and draws upon these when training the model. However, would these experiences not also be obsolete? After all, from these experiences the tuple is still not able to tell what kind of policy the other agents used.Any thoughts on the subject?