Notes of Reinforcement Learning in ICML 2018

ICML-2018 Seminar Notes

There is a productive and meaningful seminar in LAMDA Group. I am so honored to attend and benefit from this seminar. It is worth recording notes about some papers.

2018-08-04

Network Model of Abstraction and Reconstruction
Attack Results
  • This paper is intended to attack neural network by adversarial examples. It defeats 7 of 9 defense networks on ICLR in 2017. Moreover, it classify these defense networks by their defense mechanisms.

2018-08-03

  • 《Universal Planning Networks》
    • The authors wants to learn an abstraction for state by imitation. The process of abstraction is optimized by the difference of actions between those learned by a certain policy and expert demonstrations.
Universal Planning Network(UPN)
  • However, it is unnecessary to lean such an abstraction by imitation because the environment are enough generative.
  • By the way, there are lots of details about experiments in the supplement. I am grateful for authors so thoughtful.
Comparison between Semantic Images and Non-Semantic Images
  • There is little theory in this paper. However, there are lots of experiments to demonstrate how human prior knowledge helps us explore efficiently. By removing the semantic of information, we are the same as agents, who learn very slowly.

2018-08-02

  • 《Clipped Action Policy Gradient》
    • It is usual that we limit continues action range into a valid range. However, the author points out that we often make a mistake when we deal with gradients of these invalid actions. The author proposal a new way of computing these gradients, which can reduce the variance.
Clipped Distribution

2018-08-01

Seed Sampling
Laten Policy Control

2018-07-31

Illustration of "Natural" vs. "Adversarial" decision boundaries
  • 《Time Limits in Reinforcement Learning》
    • This paper concerns the effect that truncated time length brings about. The author claims that in the most of implementation of algorithms we do not consider this problem, which may result low efficiency of learning.
Correct Solution to Time-limited State
Illustration of Loss Region

2018-07-30

Algorithm of Networked Actor-Critic

2018-07-27

《Min&Match-Agent Curricula for Reinforcement Learning》

  • The author provide a method that is similar to the idea of boosting to train a series of different policy in hierarchical reinforcement learning. During the training, he also use the method of population based training(see here for more details ) to choose hyper parameters.
Scheme of Min&Match

2018-07-24

《Self-Imitation Learning》

  • The author distinguishes the good samples and bad samples in experience buffer. By restudying the good examples, the author wants to solve the exploration dilemma in reinforcement learning. From my point of view, it cannot work because it just "memory" the path to good results, which has nothing with exploration.
Algorithm of Self-Imitation Learning

《Ray-A Distributed Framework for Emerging AI Applications》

  • The main work of this paper is focus on the distributed framework of training reinforcement learning, which is done by guys of University of Berkeley.
Ray

《Katyusha X: Practical Momentum Method for Stochastic Sum-of-Nonconvex Optimization》

  • The author combines the optimization method of GD and SGD, which makes a balance. This optimization works on training neural network.
KatyushaX

2018-07-23

《QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning》

  • This work is based on the value function decomposition, which is designed to reduce the complexity of control problem in multi-agent Q-Learning. The author uses more complicated neural network to represent joint q value.
Architecture of QMIX

《Competitive Multi-agent Inverse Reinforcement Learning with Sub-optimal Demonstrations》

  • This work consider the problem of recovering reward from expert demonstrations, which may be sub-optimal rather than previously-assumed optimal. However, in the process of finding equilibrium in the zero-sum game, the computation is so huge that this algorithm is impractical in real world.
Optimization Objective of IRL
最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 215,539评论 6 497
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 91,911评论 3 391
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 161,337评论 0 351
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 57,723评论 1 290
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 66,795评论 6 388
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 50,762评论 1 294
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 39,742评论 3 416
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 38,508评论 0 271
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 44,954评论 1 308
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 37,247评论 2 331
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 39,404评论 1 345
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 35,104评论 5 340
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 40,736评论 3 324
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 31,352评论 0 21
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 32,557评论 1 268
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 47,371评论 2 368
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 44,292评论 2 352