How to solve "ValueError: Shapes must be equal rank" when I use a customized env and use baseline to do DQN?
I use a Gym environment produced by others, which can be found on gymgomoku When I use baselines to try to train a model, an ERROR occurs like:
ValueError: Shapes must be equal rank, but are 1 and 2 for 'deepq/Select'
(op: 'Select') with input shapes: [?], [?], [?,361].
I think there is something wrong with the environment but I can't get it.Because it is successful when I test other game environment on Gym's website like 'CartPolev0'.
Thank a lot!
here is my code:
import gym
from baselines import deepq
def callback(lcl, _glb):
# stop training if reward exceeds 199
is_solved = lcl['t'] > 0.9 and sum(lcl['episode_rewards'][101:1]) / 100 >= 0.9
return is_solved
def main():
env = gym.make("Gomoku19x19v0")
model = deepq.models.mlp([32, 16], layer_norm=True)
act = deepq.learn(
env,
q_func=model,
lr=0.01,
max_timesteps=10000,
print_freq=1,
checkpoint_freq=1000
)
print("Saving model to Gomoku9x9.pkl")
act.save("Gomoku9x9.pkl")
print('Finish!')
if __name__ == '__main__':
main()
See also questions close to this topic

Find the permutation of numpy array to align it with another array?
I have two numpy arrays such as
import numpy as np x = np.array([3, 1, 4]) y = np.array([4, 3, 2, 1, 0])
each containing unique values. The values in
x
are guaranteed to be a subset of those iny
.I would like to find the index of each element of
x
in the arrayy
.In the array above, this would be
[1, 3, 0]
So far I have been finding the indices one at a time in a loop:
idxs = [] for val in x: idxs.append(np.argwhere(y == val)[0,0])
But this is slow when my arrays are large.
Is there a more efficient way to do this?

Logging in Python project using structlog and it logs third party libraries which needs to be removed
I am using structlog library to log for my Python project. I see some third party library logs which I do not want. How do I remove those logs?

Loading VotingClassifier with pickle does not work
I'm storing a trained
sklearn.ensemble.VotingClassifier
usingpickle.dump(...)
When loading the file an error occurs:
No module named 'sklearn.ensemble.voting'
However, when using the exact same code and using
logistic regression
instead ofVotingClassifier
everything works fine.Is there a solution for this?

How to add disturbance force in gym cartpole environment
Is there any way to add external disturbance (like a disturbing force to pole) in gym cartpole environment.

What does spaces.Discrete mean in OpenAI Gym
I try to learn MC Monte Carlo Method applied in blackjack using openAI Gym. And I do not understand these lines:
def __init__(self, natural=False): self.action_space = spaces.Discrete(2) self.observation_space = spaces.Tuple(( spaces.Discrete(32), spaces.Discrete(11), spaces.Discrete(2))) self.seed()
Source from: https://github.com/openai/gym/blob/master/gym/envs/toy_text/blackjack.py

DQN opengym cartpolev0 with tensorflow keras : model does not converge
I've tried to solve the standard cartpole example from openai's gym using DQN with experience replay and epsilon decay and I can't seem to make it converge, in fact, the loss goes up exponentially???
I've tried looking at this example but I didn't see too much difference between my code and his. https://github.com/NoumaanKaleem/cartpole_ddqn/blob/master/deepqlearning/dqn.py
Here is my code:
import tensorflow as tf import numpy as np #self.mem.add([state,at,reward,next_state,done]) class SARSD: count = 0 def __init__(self, state, at, reward, next_state, done): self.state = state self.at = at self.reward = reward self.next_state = next_state self.done = done self.ID = SARSD.count SARSD.count = SARSD.count + 1 class ReplayMemory: def __init__(self, N): self.mem = [] self.N = N #add element to replay memory #if the size of the replay memory is alrady capped, remove first item added def add(self, elem): self.mem.append(elem) if len(self.mem) > self.N: self.mem.pop(0) def sample(self, num_elems): mem_length = len(self.mem) if num_elems > mem_length: return 1 indices = np.random.randint(0,mem_length1,num_elems) new_array = [] to_remove = [] for i in indices: new_array.append(self.mem[i]) to_remove.append(self.mem[i]) self.mem = [value for value in self.mem if value not in to_remove] return new_array def full(self): return len(self.mem) == self.N def clear(self): self.mem.clear() class DQNAgent: def __init__(self): self.mem = ReplayMemory(2048) #epsilon greedy parameter self.eps = 0.1 #learning rate self.gamma = 0.95 #minibatch size self.minibatch_size = 32 #epochs self.epochs = 1 def learn(self, num_episodes, env, model): self.mem = ReplayMemory(1000) for episode in range(0,num_episodes): #if episode % int(num_episodes/10) == 0: #self.eps = self.eps  0.2 #if self.eps < 0.05: #self.eps = 0.05 state = env.reset() state = np.reshape(state, [1,4]) done = False while done == False: at = 0 if np.random.rand() < self.eps: #sample a random action at = env.action_space.sample() else: #get the best action from policy test = model.predict(state) at = np.argmax(model.predict(state)) #execute action at next_state, reward, done, info = env.step(at) next_state = np.reshape(next_state, [1,4]) #store the transition into replay memory data = SARSD(state,at,reward,next_state,done) self.mem.add(data) #EDIT1 : surely this wasn't helping state = next_state if self.mem.full(): #sample batch of transitions from replay memory minibatch = self.mem.sample(self.minibatch_size) #intialize the targets and data y = [] x = [] for elem in minibatch: #check if terminal state terminal = elem.done if terminal == True: yj = elem.reward else: #do the discounted reward formula yj = elem.reward prediction = model.predict(elem.next_state) tmp = np.max(model.predict(elem.next_state)) yj = yj + self.gamma * np.max(model.predict(elem.next_state)) #get the prediction prediction = model.predict(elem.state) #change the prediction action value with discounted reward prediction[0][elem.at] = yj y.append(prediction[0]) x.append(elem.state[0]) #fit the model with target y x = np.array(x) y = np.array(y) model.fit(x,y,epochs=self.epochs)
Here is the main python file where I declare the keras model to be trained...
import numpy as np import gym import DQNAgent import tensorflow as tf tf.logging.set_verbosity(tf.logging.ERROR) env = gym.make('CartPolev0') agent = DQNAgent.DQNAgent() model = tf.keras.Sequential() model.add(tf.keras.layers.Dense(24, activation=tf.keras.activations.sigmoid, input_dim=4)) model.add(tf.keras.layers.Dense(24, activation=tf.keras.activations.sigmoid)) model.add(tf.keras.layers.Dense(2,activation=tf.keras.activations.linear)) model.compile(optimizer=tf.train.AdamOptimizer(0.01), loss=tf.keras.losses.mean_squared_error) agent.learn(1000,env,model) #evaluate if model has been trained correctly episode = 0 for episode in range(0,1000): next_state = env.reset() done = False while done == False: env.render() next_state = np.reshape(next_state, [1, 4]) at = np.argmax(model.predict(next_state)) next_state, reward, done, info = env.step(at)
I'm kind of new with Tensorflow/Keras as you could guess.
EDIT 1 : Forgot to actually assign state=next_state. This is helping the losses to stabilize but it still doesn't solve the cartpole problem...

How to use Keras functions to calculate baseline loss and accuracy for binary classification?
I am trying to calculate some baselines using my validation outputs. I know for a baseline cross entropy, i believe you are supposed to compare each actual value, to an average of all the values. For accuracy, i believe you are supposed to compare actual values with the most common class.
I see that accuracy looks to be working fine, but i'm not sure i'm calculating baseline binary crossentropy correctly. As stated above, i'm basically using an average of all the outputs as the baseline. For example:
actual_outputs = [1, 1, 1, 0] baseline_outputs = [0.75, 0.75, 0.75, 0.75] binary_crossentropy(actual_outputs, baseline_outputs)
Below i am basically computing a tensor for the actual outputs, a tensor filled with the output average, and a tensor filled with the most common class...
tensor_outputs_avg = tf.constant(np.full((len(val_outputs)), np.average(val_outputs)), dtype=tf.float64) tensor_outputs_avg_rounded = tf.constant(np.full((len(val_outputs)), round(np.average(val_outputs))), dtype=tf.float64) tensor_outputs = tf.constant(val_outputs, dtype=tf.float64) print('baseline loss:', tf.Session().run(binary_crossentropy(tensor_outputs_avg, tensor_outputs))) print('baseline acc:', tf.Session().run(binary_accuracy(tensor_outputs_avg_rounded, tensor_outputs)))
What am i doing wrong?

Baseline correction for spectroscopic data
I am working with Raman spectra, which often have a baseline superimposed with the actual information I am interested in. I therefore would like to estimate the baseline contribution. For this purpose, I implemented a solution from this question.
I do like the solution described there, and the code given works fine on my data. A typical result for calculated data looks like this with the red and orange line being the baseline estimates: Typical result of baseline estimation with calculated data
The problem is: I often have several thousands of spectra which I collect in a pandas DataFrame, each row representing one spectrum. My current solution is to use a for loop to iterate through the data one spectrum at a time. However, this makes the procedure quite slow. As I am rather new to python and just got used to almost not having to use for loops at all thanks to numpy/pandas/scipy, I am looking for a solution which makes it possible to omit this for loop, too. However, the used sparse matrix functions seem to be limited to two dimensions, but I might need three, and I was not able to think of another solution, yet. Does anybody have an idea?
The current code looks like this:
import numpy as np import pandas as pd from scipy.signal import gaussian import matplotlib.pyplot as plt from scipy import sparse from scipy.sparse.linalg import spsolve def baseline_correction(raman_spectra,lam,p,niter=10): #according to "Asymmetric Least Squares Smoothing" by P. Eilers and H. Boelens number_of_spectra = raman_spectra.index.size baseline_data = pd.DataFrame(np.zeros((len(raman_spectra.index),len(raman_spectra.columns))),columns=raman_spectra.columns) for ii in np.arange(number_of_spectra): curr_dataset = raman_spectra.iloc[ii,:] #this is the code for the fitting procedure L = len(curr_dataset) w = np.ones(L) D = sparse.diags([1,2,1],[0,1,2], shape=(L,L2)) for jj in range(int(niter)): W = sparse.spdiags(w,0,L,L) Z = W + lam * D.dot(D.transpose()) z = spsolve(Z,w*curr_dataset.astype(np.float64)) w = p * (curr_dataset > z) + (1p) * (curr_dataset < z) #end of fitting procedure baseline_data.iloc[ii,:] = z return baseline_data #the following four lines calculate two sample spectra wavenumbers = np.linspace(500,2000,100) intensities1 = 500*gaussian(100,2) + 0.0002*wavenumbers**2 intensities2 = 100*gaussian(100,5) + 0.0001*wavenumbers**2 raman_spectra = pd.DataFrame((intensities1,intensities2),columns=wavenumbers) #end of smaple spectra calculataion baseline_data = baseline_correction(raman_spectra,200,0.01) #the rest is just for plotting the data plt.figure(1) plt.plot(wavenumbers,raman_spectra.iloc[0]) plt.plot(wavenumbers,baseline_data.iloc[0]) plt.plot(wavenumbers,raman_spectra.iloc[1]) plt.plot(wavenumbers,baseline_data.iloc[1])

How is it possible to run a TimeVaryingCoxProportionalHazardsModel with an exponential distribution for the baseline hazard in `lifelines`?
I have a follow up to this question.
Initial situation: I wanted to estimate a CoxProportionalHazardModel with an exponential distribution for the baseline hazard. Part 1.) of the solution was to have a closer look at the hazard function that was shown to be,
h(tx) = lambda_0 * exp(beta * x)
which is, can be estimated using an accelerated failure time (AFT) model, where the hazard is:
h(tx) = lambda_0 * exp(beta * x)
Part 2.) of the solution was to define a unique class with the desired property.
Current situation: Now, I want to expand my model, to incorporate TimeVaryingCovariates. Lifelines has a documentation for this. The hazard for such models is
h(tx(t)) = lambda_0(t) * exp(beta * x(t))
.Trying to solve the problem I start with Part 1.), the hazard (imo) is
h(tx(t)) = lambda_0 * exp(beta * x(t))
,so a special case of the hazard mentioned before (
lambda_0(t) = lambda_0
for allt
). My struggle comes with Part 2.). In cox_time_varying_fitter.py, the (timevarying) baseline hazard in the function_compute_cumulative_baseline_hazard
is modelled nonparametrically bydeath_counts / hazards_at_t.sum()
, and I expect that this function has be be redefined paramtrically with a function similar to.def _cumulative_hazard(self, params, T, Xs): # params is a dictionary that maps unknown parameters to a numpy vector. # Xs is a dictionary that maps unknown parameters to a numpy 2d array lambda_ = np.exp(np.dot(Xs['lambda_'], params['lambda_'])) return T / lambda_
but then I have no Idea, where to get the
lambda_
from, and how this solution would than be any different to the nontimevarying AFT model.Example data:
from lifelines import CoxPHFitter import pandas as pd df = pd.DataFrame({ 'id': [1, 1, 1, 2, 2, 2, 3, 3, 3], 'start': [0, 1, 2, 0, 1, 2, 0, 1, 2], 'stop' : [1, 2, 3, 1, 2, 3, 1, 2, 3], 'timevarying': [1, 0, 1, 2, 0, 2, 3, 0, 3], 'event': [0, 0, 1, 0, 0, 0, 0, 0, 1] }) ctv = CoxTimeVaryingFitter() ctv.fit(df, id_col="id", event_col="event", start_col="start", stop_col="stop")
The goal is not to learn which covariate affects the chances of observing an event; instead the goal is more is to predict a hazard for different time varying values of the variable
timevarying
. Currently I manage only to get the partial hazard, but I need the baseline hazard too. Sure, in the documentation it is stated, that the prediction in such cases is not trivial, but considertimevarying
to be the temperature in some area and whether a forst fire occured  in such cases I think a prediction using wheather forecast data is viable.The question: How is it possible to run a TimeVaryingCoxProportionalHazardsModel with an exponential distribution for the baseline hazard in
lifelines
?