2018-12-20 PPO debug experience

PPO Debug Experience

Recently, I need to perform PPO in a complex env. I refer to some code in GitHub, however, I can't grasp their meaning...

After reading PPO paper, I decided to code by myself.

I already have some experience writing RL code. After several minutes, I finished the first version with gym-cart-pole-v0. However, that didn't work...

Then I started to check the core algorithm again and again...It's very sad, the code still did not work.

So I suspect whether the agent's interacting with env is right or not...
Then I started to debug the interaction between agent and env.
Luckily, I found that the reward(or Gt/advantage) went wrong. So I refer to some papers about advantage such as GAE, TRPO and so on...

Then I changed the way reward is calculated. The code work.
You can click here to ref my code.

©著作权归作者所有,转载或内容合作请联系作者
【社区内容提示】社区部分内容疑似由AI辅助生成，浏览时请结合常识与多方信息审慎甄别。
平台声明：文章内容（如有图片或视频亦包括在内）由作者上传并发布，文章内容仅代表作者本人观点，简书系信息发布平台，仅提供信息存储服务。

友情链接更多精彩内容

赞1赞

赞赏

手机看全文