CAN DEEP REINFORCEMENT LEARNING SOLVE ERDOS-SELFRIDGE-SPENCER GAMES?

Maithra Raghu
Google Brain and Cornell University
{maithrar}@gmail.com
Alex Irpan
Google Brain
Jacob Andreas
University of California, Berkeley
Robert Kleinberg
Cornell University
Quoc V. Le
Google Brain
Jon Kleinberg
Cornell University
ABSTRACT
Deep reinforcement learning has achieved many recent successes, but our understanding
of its strengths and limitations is hampered by the lack of rich environments
in which we can fully characterize optimal behavior, and correspondingly
diagnose individual actions against such a characterization. Here we consider a
family of combinatorial games, arising from work of Erdos, Selfridge, and Spencer,
and we propose their use as environments for evaluating and comparing different
approaches to reinforcement learning. These games have a number of appealing
features: they are challenging for current learning approaches, but they form (i)
a low-dimensional, simply parametrized environment where (ii) there is a linear
closed form solution for optimal behavior from any state, and (iii) the difficulty
of the game can be tuned by changing environment parameters in an interpretable
way. We use these Erdos-Selfridge-Spencer games not only to compare different
algorithms, but also to compare approaches based on supervised and reinforcement
learning, to analyze the power of multi-agent approaches in improving performance,
and to evaluate generalization to environments outside the training set.

最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
平台声明:文章内容(如有图片或视频亦包括在内)由作者上传并发布,文章内容仅代表作者本人观点,简书系信息发布平台,仅提供信息存储服务。

推荐阅读更多精彩内容

  • 如果告诉你,科学的步行也能燃脂瘦身,你是否就动心了? 步行消耗多少热量? 运动医学专家认为,运动消耗人体内多少热量...
    自在随缘_1b6d阅读 499评论 0 0
  • ——记三下乡之“重庆市第五次党代会”会议精神宣讲活动 本文参加#感悟三下乡,青春筑梦行#活动,本人承诺,文章内容...
    y一粟阅读 386评论 1 2
  • 来源:http://bbs.ichunqiu.com/thread-9236-1-1.html?from=ch 社...
    池寒阅读 5,742评论 0 4
  • “21天爱上写作训练营”即将结束了,虽然自己没有写出好的文章,也不算是个好学生,但我还是收获了很多。 首先是弘丹老...
    乐观桂娥阅读 444评论 2 5
  • 今天是17年第一天,想标记一下,有三件事情特别激动、震撼。 一,老朋友来访,倍感高兴。今年是第二年在国外,跨年的时...
    侯老师聊教育阅读 308评论 2 1