ICML-2018 Seminar Notes
There is a productive and meaningful seminar in LAMDA Group. I am so honored to attend and benefit from this seminar. It is worth recording notes about some papers.
2018-08-04
-
《Efficient Mode-Based Deep Reinforcement Learning with Variational State Tabulation》
- This paper proposal that by abstraction we can build a easy state transition model, upon which we can apply ordinary tabular methods(like Q-Learning) to accelerate learning.
However, this abstraction process is done by reconstructing code to initial state, which is unnecessary. Because all that we need is an easy and feasible generative model about transition, we do not need reconstruction. We can think more to how to build a abstraction model in the future.
《Obfuscated Gradients Give a False Sense of Security: Circumventing Defense to Adversarial Examples》
- This paper is intended to attack neural network by adversarial examples. It defeats 7 of 9 defense networks on ICLR in 2017. Moreover, it classify these defense networks by their defense mechanisms.
2018-08-03
-
《Universal Planning Networks》
- The authors wants to learn an abstraction for state by imitation. The process of abstraction is optimized by the difference of actions between those learned by a certain policy and expert demonstrations.
- However, it is unnecessary to lean such an abstraction by imitation because the environment are enough generative.
- By the way, there are lots of details about experiments in the supplement. I am grateful for authors so thoughtful.
- There is little theory in this paper. However, there are lots of experiments to demonstrate how human prior knowledge helps us explore efficiently. By removing the semantic of information, we are the same as agents, who learn very slowly.
2018-08-02
-
《Clipped Action Policy Gradient》
- It is usual that we limit continues action range into a valid range. However, the author points out that we often make a mistake when we deal with gradients of these invalid actions. The author proposal a new way of computing these gradients, which can reduce the variance.
2018-08-01
-
《Coordinated Exploration in Concurrent Reinforcement Learning》
- In the case of multi-agent or multi-thread to explore in same environment, ordinary methods like UCB will often explore in a limited range. The author proposals a sample method for multi-agent, which can explore efficiently. For a demo, see here.
-
《Latent Space Policies for Hierarchical Reinforcement Learning》
- The work of this paper is focus on the hierarchical reinforcement learning by latent space. By a mechanism of double-shot, the high level of policy can correct the mistake of low level policy.
2018-07-31
-
《Towards Deep Learning Models Resistant to Adversarial Attacks》
- The author proposal an algorithm of minimax optimization about how to generate adversarial examples and how to defense these examples. Luckily, it is robust that it defense the attacks of paper in ICML 2018 we mentioned about.
-
《Time Limits in Reinforcement Learning》
- This paper concerns the effect that truncated time length brings about. The author claims that in the most of implementation of algorithms we do not consider this problem, which may result low efficiency of learning.
-
《Essentially No Barriers in Neural Network Energy Landscape》
- By delicate design of experiments, the author points out here are a comparative flat region in loss of neural network, where many parameters share a similar performance. Thus, we need careful to explore these parameters.
2018-07-30
-
《Fully Decentralized Multi-Agent Reinforcement Learning with Networked Agents》
- The author extends actor-critic methods to multi agent environments. These decentralized agents can communicate by sharing the parameters of neural networks to coordinate. Personally, it is somewhat complicated.
2018-07-27
《Min&Match-Agent Curricula for Reinforcement Learning》
- The author provide a method that is similar to the idea of boosting to train a series of different policy in hierarchical reinforcement learning. During the training, he also use the method of population based training(see here for more details ) to choose hyper parameters.
2018-07-24
《Self-Imitation Learning》
- The author distinguishes the good samples and bad samples in experience buffer. By restudying the good examples, the author wants to solve the exploration dilemma in reinforcement learning. From my point of view, it cannot work because it just "memory" the path to good results, which has nothing with exploration.
《Ray-A Distributed Framework for Emerging AI Applications》
- The main work of this paper is focus on the distributed framework of training reinforcement learning, which is done by guys of University of Berkeley.
《Katyusha X: Practical Momentum Method for Stochastic Sum-of-Nonconvex Optimization》
- The author combines the optimization method of GD and SGD, which makes a balance. This optimization works on training neural network.
2018-07-23
《QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning》
- This work is based on the value function decomposition, which is designed to reduce the complexity of control problem in multi-agent Q-Learning. The author uses more complicated neural network to represent joint q value.
《Competitive Multi-agent Inverse Reinforcement Learning with Sub-optimal Demonstrations》
- This work consider the problem of recovering reward from expert demonstrations, which may be sub-optimal rather than previously-assumed optimal. However, in the process of finding equilibrium in the zero-sum game, the computation is so huge that this algorithm is impractical in real world.