Anonymous authors
Paper under double-blind review
ABSTRACT
Many practical reinforcement learning problems contain catastrophic states that
the optimal policy visits infrequently or never. Even on toy problems, deep reinforcement
learners periodically revisit these states, once they are forgotten under
a new policy. In this paper, we introduce intrinsic fear, a learned reward shaping
that accelerates deep reinforcement learning and guards oscillating policies against
periodic catastrophes. Our approach incorporates a second model trained via supervised
learning to predict the probability of imminent catastrophe. This score acts
as a penalty on the Q-learning objective. Our theoretical analysis demonstrates that
the perturbed objective yields the same average return under strong assumptions
and an �-close average return under weaker assumptions. Our analysis also shows
robustness to classification errors. Equipped with intrinsic fear, our DQNs solve
the toy environments and improve on the Atari games Seaquest, Asteroids, and
Freeway.
AVOIDING CATASTROPHIC STATES WITH INTRINSIC FEAR
最后编辑于 :
©著作权归作者所有,转载或内容合作请联系作者
平台声明:文章内容(如有图片或视频亦包括在内)由作者上传并发布,文章内容仅代表作者本人观点,简书系信息发布平台,仅提供信息存储服务。
平台声明:文章内容(如有图片或视频亦包括在内)由作者上传并发布,文章内容仅代表作者本人观点,简书系信息发布平台,仅提供信息存储服务。