强化学习是目前热门的研究方向。对不同强化学习的方法与paper进行分类有助于我们进一步了解针对不同的应用场景,如何使用合适的强化学习方法。本文将对强化学习进行分类并列出对应的paper。
5. Memory系列
算法名称:MFEC
论文标题:Model-Free Episodic Control
发表会议:Arxiv
论文链接:https://arxiv.org/abs/1606.04460
当前谷歌学术引用次数:138
算法名称:NEC
论文标题:Neural Episodic Control
发表会议:ICML, 2017
论文链接:https://arxiv.org/abs/1703.01988
当前谷歌学术引用次数:171
算法名称:Neural Map
论文标题:Neural Map: Structured Memory for Deep Reinforcement Learning
发表会议:ICLR, 2018
论文链接:https://arxiv.org/abs/1702.08360
当前谷歌学术引用次数:173
算法名称:MERLIN
论文标题:Unsupervised Predictive Memory in a Goal-Directed Agent
发表会议:Arxiv
论文链接:https://arxiv.org/abs/1803.10760
当前谷歌学术引用次数:108
算法名称:RMC
论文标题:Relational Recurrent Neural Networks
发表会议:ICLR, 2018
论文链接:https://arxiv.org/abs/1806.01822
当前谷歌学术引用次数:121
6. Model-Based RL系列
a. Model is Learned
算法名称:I2A
论文标题:Imagination-Augmented Agents for Deep Reinforcement Learning
发表会议:NIPS, 2017
论文链接:https://arxiv.org/abs/1707.06203
当前谷歌学术引用次数:182
算法名称:MBMF
论文标题:Neural Network Dynamics for Model-Based Deep Reinforcement Learning with Model-Free Fine-Tuning
发表会议:ICRA, 2018
论文链接:https://arxiv.org/abs/1708.02596
当前谷歌学术引用次数:503
算法名称:MVE
论文标题:Model-Based Value Expansion for Efficient Model-Free Reinforcement Learning
发表会议:Arxiv
论文链接:https://arxiv.org/abs/1803.00101
当前谷歌学术引用次数:109
算法名称:STEVE
论文标题:Sample-Efficient Reinforcement Learning with Stochastic Ensemble Value Expansion
发表会议:NIPS, 2018
论文链接:https://arxiv.org/abs/1807.01675
当前谷歌学术引用次数:127
算法名称:ME-TRPO
论文标题:Model-Ensemble Trust-Region Policy Optimization
发表会议:ICLR, 2018
论文链接:https://openreview.net/forum?id=SJJinbWRZ¬eId=SJJinbWRZ
当前谷歌学术引用次数:195
算法名称:MB-MPO
论文标题:Model-Based Reinforcement Learning via Meta-Policy Optimization
发表会议:Conference on Robot Learning, 2018
论文链接:https://arxiv.org/abs/1809.05214
当前谷歌学术引用次数:108
算法名称:MB-MPO
论文标题:Recurrent World Models Facilitate Policy Evolution
发表会议:NIPS, 2018
论文链接:https://arxiv.org/abs/1809.01999
当前谷歌学术引用次数:316
b. Model is Learned
算法名称:AlphaZero
论文标题:Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm
发表会议:Science, 2018
论文链接:https://arxiv.org/abs/1712.01815
当前谷歌学术引用次数:971
算法名称:ExIt
论文标题:Thinking Fast and Slow with Deep Learning and Tree Search
发表会议:NIPS, 2017
论文链接:https://arxiv.org/abs/1705.08439
当前谷歌学术引用次数:174