Nan loss on custom openai stable baselines environment
I have tried to implement a custom environment for the openai stable- baselines however in contrast to the openai gym environments, when I train PPO2 on the environment I recieve nan value loss, policy loss, approxkl, policy entropy etc. I am currently using the MlpPolicy and I have tested it on Breakout-v0 without any problems. I have made sure to normalize observations between 0 and 1 and I have made sure to normalize rewards between -10 and 10. What could possibly be the cause of receiving nans with the MlpPolicy in this respect?