DeepMind 最新论文合集

Neil Zhu,简书ID Not_GOD,University AI 创始人 & Chief Scientist,致力于推进世界人工智能化进程。制定并实施 UAI 中长期增长战略和目标,带领团队快速成长为人工智能领域最专业的力量。
作为行业领导者,他和UAI一起在2014年创建了TASA(中国最早的人工智能社团), DL Center(深度学习知识中心全球价值网络),AI growth(行业智库培训)等,为中国的人工智能人才建设输送了大量的血液和养分。此外,他还参与或者举办过各类国际性的人工智能峰会和活动,产生了巨大的影响力,书写了60万字的人工智能精品技术内容,生产翻译了全球第一本深度学习入门书《神经网络与深度学习》,生产的内容被大量的专业垂直公众号和媒体转载与连载。曾经受邀为国内顶尖大学制定人工智能学习规划和教授人工智能前沿课程,均受学生和老师好评。

Continuous Deep Q-Learning with Model-based Acceleration

http://arxiv.org/pdf/1603.00748v1.pdf

简介:Model-free reinforcement learning has been successfully
applied to a range of challenging problems,
and has recently been extended to handle
large neural network policies and value functions.
However, the sample complexity of modelfree
algorithms, particularly when using highdimensional
function approximators, tends to
limit their applicability to physical systems. In
this paper, we explore algorithms and representations
to reduce the sample complexity of
deep reinforcement learning for continuous control
tasks. We propose two complementary techniques
for improving the efficiency of such algorithms.
First, we derive a continuous variant of
the Q-learning algorithm, which we call normalized
adantage functions (NAF), as an alternative
to the more commonly used policy gradient and
actor-critic methods. NAF representation allows
us to apply Q-learning with experience replay to
continuous tasks, and substantially improves performance
on a set of simulated robotic control
tasks. To further improve the efficiency of our
approach, we explore the use of learned models
for accelerating model-free reinforcement learning.
We show that iteratively refitted local linear
models are especially effective for this, and
demonstrate substantially faster learning on domains
where such models are applicable.

Learning functions across many orders of magnitudes

http://arxiv.org/pdf/1602.07714v1.pdf

简介:Learning non-linear functions can be hard when
the magnitude of the target function is unknown
beforehand, as most learning algorithms are not
scale invariant. We propose an algorithm to adaptively
normalize these targets. This is complementary
to recent advances in input normalization.
Importantly, the proposed method preserves
the unnormalized outputs whenever the normalization
is updated to avoid instability caused by
non-stationarity. It can be combined with any
learning algorithm and any non-linear function
approximation, including the important special
case of deep learning. We empirically validate
the method in supervised learning and reinforcement
learning and apply it to learning how to play
Atari 2600 games. Previous work on applying
deep learning to this domain relied on clipping
the rewards to make learning in different games
more homogeneous, but this uses the domainspecific
knowledge that in these games counting
rewards is often almost as informative as summing
these. Using our adaptive normalization
we can remove this heuristic without diminishing
overall performance, and even improve performance
on some games, such as Ms. Pac-Man
and Centipede, on which previous methods did
not perform well.

Deep Exploration via Bootstrapped DQN

http://arxiv.org/pdf/1602.04621v1.pdf

简介:Efficient exploration in complex environments
remains a major challenge for reinforcement
learning. We propose bootstrapped DQN, a simple
algorithm that explores in a computationally
and statistically efficient manner through use
of randomized value functions. Unlike dithering
strategies such as �-greedy exploration, bootstrapped
DQN carries out temporally-extended
(or deep) exploration; this can lead to exponentially
faster learning. We demonstrate these
benefits in complex stochastic MDPs and in the
large-scale Arcade Learning Environment. Bootstrapped
DQN substantially improves learning
times and performance across most Atari games.

One-shot Learning with Memory-Augmented Neural Networks

https://arxiv.org/pdf/1605.06065v1.pdf

简介:One-shot 学习上的工作,传统方法需要大量数据进行学习. 新数据进来,模型必然是低效地重新学习参数来平滑地引入新数据的信息. 拥有增强记忆能力的结构,如 NTM,提供了快速编码和检索新信息的能力,因此潜在地避开了传统模型的弱点。这篇文章给出了记忆增强神经网络可以快速吸收新的数据,并利用新数据仅仅在加入很少样本后作出准确的预测. 另外还引入一种新的获取外部记忆的方法,这种方法聚焦在记忆内容,不像之前的方法额外使用了基于记忆的位置的机制来定位。

Deep Reinforcement Learning with Attention for Slate Markov Decision Processes with High-Dimensional States and Actions

http://arxiv.org/pdf/1512.01124v2.pdf

简介:Many real-world problems come with action spaces represented as feature vectors. Although high-dimensional control is a largely unsolved problem, there has recently been progress for modest dimensionalities. Here we report on a successful attempt at addressing problems of dimensionality as high as 2000, of a particular form. Motivated by important applications such as recommendation systems
that do not fit the standard reinforcement learning frameworks, we introduce Slate Markov Decision Processes (slate-MDPs).

Slate-MDP 是一个组合行动(在基础 MDP 中的原始行动的元组)空间的 MDP. agent 并不去控制这个行动的选择,行动甚至可能不是来自组合行动的,比如说,推荐系统中所有的推荐都可以被用户忽略的。我们使用深度 Q-学习基于状态和行动的特征表示来学习整个组合行动的值。

Unlike existing methods, we optimize for both the combinatorial and sequential aspects of our tasks. The new agent’s superiority over agents that either ignore the combinatorial or sequential long-term value aspect is demonstrated on a range of environments with dynamics from a real-world recommendation
system. Further, we use deep deterministic policy gradients to learn
a policy that for each position of the slate, guides attention towards the part of the action space in which the value is the highest and we only evaluate actions in this area. The attention is used within a sequentially greedy procedure leveraging submodularity. Finally, we show how introducing risk-seeking can dramatically improve the agents performance and ability to discover more far reaching strategies.

Increasing the Action Gap: New Operators for Reinforcement Learning

http://arxiv.org/pdf/1512.04860v1.pdf

简介:This paper introduces new optimality-preserving operators
on Q-functions. We first describe an operator for tabular representations, the consistent Bellman operator, which incorporates a notion of local policy consistency. We show that this local consistency leads to an increase in the action gap at each state; increasing this gap, we argue, mitigates the undesirable effects of approximation and estimation errors on the induced greedy policies. This operator can also be applied to discretized continuous space and time problems, and we provide empirical results evidencing superior performance in this context. Extending the idea of a locally consistent operator, we then derive sufficient conditions for an operator to preserve optimality, leading to a family of operators which includes our consistent Bellman
operator. As corollaries we provide a proof of optimality for Baird’s advantage learning algorithm and derive other gap-increasing operators with interesting properties. We conclude with an empirical study on 60 Atari 2600 games illustrating the strong potential of these new operators.

MUPROP: UNBIASED BACKPROPAGATION FOR STOCHASTIC NEURAL NETWORKS

http://arxiv.org/pdf/1511.05176v2.pdf

简介:Deep neural networks are powerful parametric models that can be trained effi-
ciently using the backpropagation algorithm. Stochastic neural networks combine
the power of large parametric functions with that of graphical models, which
makes it possible to learn very complex distributions. However, as backpropagation
is not directly applicable to stochastic networks that include discrete sampling
operations within their computational graph, training such networks remains diffi-
cult. We present MuProp, an unbiased gradient estimator for stochastic networks,
designed to make this task easier. MuProp improves on the likelihood-ratio estimator
by reducing its variance using a control variate based on the first-order Taylor
expansion of a mean-field network. Crucially, unlike prior attempts at using
backpropagation for training stochastic networks, the resulting estimator is unbiased
and well behaved. Our experiments on structured output prediction and discrete
latent variable modeling demonstrate that MuProp yields consistently good
performance across a range of difficult tasks.

POLICY DISTILLATION

http://arxiv.org/pdf/1511.06295.pdf

最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 216,193评论 6 498
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 92,306评论 3 392
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 162,130评论 0 353
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 58,110评论 1 292
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 67,118评论 6 388
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 51,085评论 1 295
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 40,007评论 3 417
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 38,844评论 0 273
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 45,283评论 1 310
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 37,508评论 2 332
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 39,667评论 1 348
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 35,395评论 5 343
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 40,985评论 3 325
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 31,630评论 0 21
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 32,797评论 1 268
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 47,653评论 2 368
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 44,553评论 2 352

推荐阅读更多精彩内容

  • “好巧呀!”说话的男孩子,也算是这片校园里的名人,生来一副乐天派,成天乐呵乐呵,吊儿郎当,却考了文科的第一名,自...
    肥乔丹阅读 657评论 0 0
  • 霞筱筱阅读 289评论 1 5
  • 日落西山下, 萋萋蒿草长。 故人渐远去, 思念入愁肠。
    悠然芬芳阅读 288评论 2 9