Anonymous authors
Paper under double-blind review
ABSTRACT
Many practical reinforcement learning problems contain catastrophic states that
the optimal policy visits infrequently or never. Even on toy problems, deep reinforcement
learners periodically revisit these states, once they are forgotten under
a new policy. In this paper, we introduce intrinsic fear, a learned reward shaping
that accelerates deep reinforcement learning and guards oscillating policies against
periodic catastrophes. Our approach incorporates a second model trained via supervised
learning to predict the probability of imminent catastrophe. This score acts
as a penalty on the Q-learning objective. Our theoretical analysis demonstrates that
the perturbed objective yields the same average return under strong assumptions
and an �-close average return under weaker assumptions. Our analysis also shows
robustness to classification errors. Equipped with intrinsic fear, our DQNs solve
the toy environments and improve on the Atari games Seaquest, Asteroids, and
Freeway.
AVOIDING CATASTROPHIC STATES WITH INTRINSIC FEAR
最后编辑于 :
©著作权归作者所有,转载或内容合作请联系作者
- 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
- 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
- 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...