Anonymous authors
Paper under double-blind review
ABSTRACT
We present Value Propagation (VProp), a parameter-efficient differentiable planning
module built on Value Iteration which can successfully be trained in a reinforcement
learning fashion to solve unseen tasks, has the capability to generalize to
larger map sizes, and can learn to navigate in dynamic environments. We evaluate
on configurations of MazeBase grid-worlds, with randomly generated environments
of several different sizes. Furthermore, we show that the module and its variants
provide a simple way to learn to plan when adversarial agents are present and
the environment is stochastic, providing a cost-efficient learning system to build
low-level size-invariant planners for a variety of interactive navigation problems.
VALUE PROPAGATION NETWORKS
最后编辑于 :
©著作权归作者所有,转载或内容合作请联系作者
- 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
- 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
- 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...