Where is training model in deepqlearning and experience replay section of algorithm
I have a problem in understanding the training section of deepqlearning in the paper of Deepmind. https://www.nature.com/nature/journal/v518/n7540/pdf/nature14236.pdf
In its algorithm, where is the section of training? How we can train and how we can test this algorithm? Also, which section is experience replay section?
See also questions close to this topic

Keras custom loss function for binary encoded (not onehot encoded) categorical data
I need help writing a custom loss/metric function for Keras. My categories are binary encoded (not onehot). I would like to do a bitwise comparison between the real classes and the predicted classes.
For example, Reallabel: 0x1111111111 Predictedlabel: 0x1011101111
The predictedlabel has 8 of 10 bits correct so the accuracy of this match should be 0.8 not 0.0. I have no idea how I am support to do this with Keras commands.

Tensorflow object detection: export inference graph
I have an error in exporting an inference graph and I searched for hours but can't find a solution; this is my error:
Command: python export_inference_graph.py input_type image_tensor pipeline_config_path training faster_rcnn_inception_v2_pets.config trained_checkpoint_prefix training "model.ckpt2950" output_directory export
Error Traceback:
Traceback (most recent call last): File "export_inference_graph.py", line 151, in tf.app.run() File "C:\Users\OctaNet\Miniconda3\envs\tensorflow\lib\sitepackages\tensorflow\python\platform\app.py", line 125, in run _sys.exit(main(argv)) File "export_inference_graph.py", line 135, in main text_format.Merge(f.read(), pipeline_config) File "C:\Users\OctaNet\Miniconda3\envs\tensorflow\lib\sitepackages\tensorflow\python\lib\io\file_io.py", line 125, in read self._preread_check() File "C:\Users\OctaNet\Miniconda3\envs\tensorflow\lib\sitepackages\tensorflow\python\lib\io\file_io.py", line 85, in _preread_check compat.as_bytes(self.__name), 1024 * 512, status) File "C:\Users\OctaNet\Miniconda3\envs\tensorflow\lib\sitepackages\tensorflow\python\framework\errors_impl.py", line 528, in exit c_api.TF_GetCode(self.status.status)) tensorflow.python.framework.errors_impl.UnknownError: NewRandomAccessFile failed to Create/Open: training : Acc\udce8s refus\udce9. ; Input/output error

Do Neuronal networks getting slow in adaption after a lot of training?
I am a beginner in the neuronal network field and I want to understand a certain statement. A friend said that a neuronal network gets slower after you fit a lot of data in.
Right now, I just did the coursera ML course from androw ng. There, I implemented backpropagation. I thought it just adaptes the model related to the expected output by using different types of calculations. Nevertheless, it was not like the history was used to adapt the model. Just the current state of the neurons were checked and their weight were adapted backwards in combination with regularisation.
Is my assumption correct or am I wrong? Are there some libraries that use history data that could result in a slowly adapting model after a certain amount of training?
I want to use a simple neuronal network for reinforcement learning and I want to get an idea if I need to reset my model if the target environment changes for some reason. Otherwise my model would be slower and slower in adaption after time.
Thanks for any links and explanations in advanced!

How are train and test accuracies calculated when the model is certain?
I have a dataset which is not split into train and test parts. I want to measure the train and test accuracies. I know that crossvalidation gives more accurate results. But, I think there is no need to split my dataset into train, validation and test. Since the model that I will use is certain. I want to ask some questions:
 Should I split my data into 3 parts (train, validation, test) and should I use kfold?
 Is it logical to use crossvalidation to get the test accuracy since I don't use it for tuning hyperparameters?
Thanks for now.

How to get identical results with lightgbm.cv and manual Cross validation with lightgbm.train?
I am trying to get identical results with
lightgbm.cv
and manual cross validation usingsklearn KFold CV
andlightgbm.train
. The results I have obtained are almost similar but not the same. (Should one expect same scores using these two approaches?)Note: The reason that I am interested in manual cross validation is that I would like to include undersampling im my calculation and perform hyperparameter tuning with this cross validation approach. This is doable using pipeline if we use sklearn estimator as described in this post. However, I couldn't implement
lightgbm
withinPipeline
withearlystopping
. I thought manual cross validation could be the solution of this problem if we are interested to use lighgbm for training the data.I have tried to check it by the following code:
#  Import necessary libraries and modules import numpy as np from statistics import mean, stdev from collections import Counter from sklearn.datasets import make_classification from sklearn.model_selection import KFold from sklearn.metrics import roc_auc_score import lightgbm as lgb #  Initial parameters RANDOM_STATE = 42 N_SAMPLES = 1000000 K_FOLD = 5 #  Generate the dataset X, y = make_classification(n_classes=2, class_sep=1, n_features=10, n_redundant=2, weights=[0.999, 0.001], n_informative=5, flip_y=0.0, n_samples=N_SAMPLES, random_state=RANDOM_STATE) print('Number of samples in each class %s' % Counter(y)) #  set lgb params params = {'application':'binary', 'num_iterations':500, 'early_stopping_round':5, 'num_boost_round':100, 'metric':'auc'} #  Cross validation with lightgbm.cv  train_data = lgb.Dataset(data=X, label=y, free_raw_data=False) cv_result = lgb.cv(params, train_data, nfold=K_FOLD, seed=RANDOM_STATE, stratified=False, shuffle=False, verbose_eval=1, metrics=['auc']) #  #  Manual cross validation  train_data = lgb.Dataset(data=X, label=y, free_raw_data=False) auc_roc = [] kf = KFold(n_splits=K_FOLD, shuffle=False, random_state=RANDOM_STATE) for train_index, val_index in kf.split(X): X_train, X_val = X[train_index], X[val_index] y_train, y_val = y[train_index], y[val_index] # prepare data train_data = lgb.Dataset(data=X_train, label=y_train) valid_data = lgb.Dataset(data=X_val, label=y_val) lgbm = lgb.train(params, train_data, valid_sets=[train_data, valid_data], verbose_eval=1) y_pred = lgbm.predict(X_val, num_iteration = lgbm.best_iteration) auc_roc.append(roc_auc_score(y_val, y_pred)) #  #  Print the results: max_mean_auc = max(cv_result['aucmean']) indx_max_mean_auc = cv_result['aucmean'].index(max_mean_auc) print('lgb.cv Cross Validation, ROC_AUC = {} +/ {}'.format(np.round(max_mean_auc,4), np.round(cv_result['aucstdv'][indx_max_mean_auc],4))) print('Using Manual Cross Validation, ROC_AUC = {} +/ {}'.format(np.round(mean(auc_roc),4), np.round(stdev(auc_roc),4)))
I obtained
auc = 0.8203 +/ 0.0246
withlightgbm.cv
andauc = 0.8113 +/ 0.0129
usingKFold
andlightgbm.train
which are almost identical considering its stdv, however, I expected I get the same score. 
Training Yolo to detect object with keypoints or landmarks
I have worked with yolo to detect iris region in eye in real time I have tied training my own dataset of iris using yolov3 darknet (library of AlexayAB ) to detect iris in real time (with webcam of smartphone for exemple ) , e.g:
https://i.imgur.com/tZemyqp.png
I want to mixed yolo algorithm with Landmark detection ( or Keypoints detection ) to draw points takes the form of the iris after training and detection the iris region with yolo . I ask is it possible to do that?
Exemple The desired result :

Strange update function behaviour in Q Learning
I am encountering an issue when updating q values. They tend towards infinity. The code below shows the update function following the Bellman equation:
q[board][action] = q[board][action] + lr * immediate_reward + (discount * best_q_value_new_board  immediate_reward)
For some reason, the values are growing infinitely big. I can't think of why that is.
Any input is appreciated!
Thanks

SARSA with Linear Value Func. Approx. not converging to correct Qfactors
I've been trying to implement SARSA with LVFA. So far, I've implemented the following code, but it doesn't seem to work (doesn't converge to the correct Q factors for even the simplest of problems).
Any help with why my code doesn't work will be greatly appreciated! Thanks!
I've implemented the following TD update rule (it is derived from stochastic gradient descent on TDBE from what I understand) : https://i.ibb.co/Jyt5DFd/Capture.png
I've simply replaced the TD update rule in Q with the above, which should work since SARSA with LVFA is guaranteed to converge to the true fixed point theoretically. So I'm assuming something is wrong with my implementation, but I haven't been able to spot any bugs yet.
def SARSA_LVFA(theta, phi, r, s, gamma = 0.5, a = 0.005): """ SARSA algorithm with LVFA """ limit = 10**5 # choose action u from epsgreedy policy if np.random.random() < 0.9: u = np.argmax(np.matmul(phi(s).T, theta)) else: u = np.random.choice(U) for i in range(limit): phi_s = phi(s) # get features for current state s s_ = np.random.choice(S, p = f(s, u)) # perform action u (noisy model) phi_s_ = phi(s_) # get features for new state s_ # choose action u_ from epsgreedy policy if np.random.random() < 0.9: u_ = np.argmax(np.matmul(phi_s_.T, theta)) else: u_ = np.random.choice(U) # caculate temporal difference delta td_target = r(s, u) + gamma*np.matmul(theta[:, u_].T, phi_s_) delta = td_target  np.matmul(theta[:, u].T, phi_s) # update feature weights theta[:, u] = theta[:, u] + a * delta * phi_s.T s = s_ u = u_ return theta
Some notes on the code:
U
is the action space andS
is the state space.theta
is a weight matrix of shapelen( phi ) x len( U )
, wherephi
is the feature (column) vector for a states
. You get your Qmatrix by simply doing
np.matmul(Phi.T, theta)
, wherePhi
is just a collections of all your feature vectors [phi(s1)
phi(s2)
 … phi(sN)
].  Leave any other questions you might have in the comments!
Let's try the above function on a toy linefollowing problem. For a state space
S = [0, 1, 2]
(left of line, on the line and right of line respectively) and action spaceU = [0, 1, 2]
(right, idle and left respectively). Take the following reward functionr
, system modelf
and feature functionphi
:def r(x, u): """ reward function """ if x == S[1]: return 1.0 else: return 0.0 def f(x, u): ''' list with probabilities for each successor is returned. All states are valid successors, but they can receive zero probability. ''' if x == S[1]: # on line if u == U[2]: # left result = [0.2, 0.7, 0.1] elif u == U[0]: # right result = [0.1, 0.7, 0.2] elif u == U[1]: # none result = [0.0, 1.0, 0.0] elif x == S[0]: # left of line if u == U[2]: result = [1.0,0.0,0.0] elif u == U[0]: result = [0.0,1.0,0.0] elif u == U[1]: result = [1.0, 0.0, 0.0] elif x == S[2]: # right of line if u == U[2]: result = [0.0,1.0,0.0] elif u == U[0]: result = [0.0,0.0,1.0] elif u == U[1]: result = [0.0, 0.0, 1.0] return result def phi1(s): if s == S[1]: return 1.0 else: return 0.0 def phi2(s): if s != S[1]: return 1.0 else: return 0.0 def phi(x): """ get features for state x """ features = np.asarray([[phi1(x), phi2(x)]]).T return features
Doing
theta_optimal = SARSA_LVFA(theta, phi, r, some_start_state)
gives you an incorrect Q matrix, something like:[[0.27982704 0.13408623 0.28761029] [1.71499981 1.98207434 1.72503455] [0.27982704 0.13408623 0.28761029]]
And a corresponding incorrect policy
[2 1 2]
or sometimes[0 1 0]
.I've tried the same toy problem on simple SARSA and Qlearning (without LVFA) and get the correct policy
[0 1 2]
and Qmatrix:[[0.98987301 0.46667215 0.4698729 ] [1.80929669 1.98819096 1.8406385 ] [0.47045638 0.47047932 0.99035824]]

Algorithm for subdivision of 3D surfaces
Background
I have a 3D scene, and I want to discretize the space of it so that every coordinate
(x, y, z)
belongs to a specific cell.
Coordinates close to each other belongs to same cells. When I input a coordinate that lies on the surface of one of my tridimensional objects (mainly spheres), I need to retrieve the cell it belongs to.
For those familiar with Reinforcement learning, this operation will be used for QLearning, to map states (cells depending on coordinates) to Qvalues
This is an example of what i am trying to achieve:Possible solutions
I know that Voronoi diagram can help in this, but I also read that implementing it from scratch is complicated. I found some libraries in C++ to handle this, but they are mainly Voronoi 2D (CGAL). I don't specifically need Voronoi, I only need to discretize the space in a reasonable way and looking for libraries/implementation for it I stumbled upon Voronoi diagrams.
Question Is there anyone familiar with libraries or a public implementation to achieve this discretization in C++?