Yutian Chen
Matthew W. Hoffman
Sergio Gomez Cólmenarejo
Misha Denil
Timothy P. Lillicrap
Nando de Freitas
all from DeepMind
We present a learning to learn approach for training recurrent neural networks to perform black-box global optimization.
In the meta-learning phase we use a large set of smooth target functions to learn a recurrent neural network (RNN) optimizer, which is either a long-short term memory network or a differentiable neural computer.
After learning, the RNN can be applied to learn policies in reinforcement learning, as well as other black-box learning tasks, including continuous correlated bandits and experimental design.
We compare this approach to Bayesian optimization, with emphasis on the issues of computation speed, horizon length, and exploration-exploitation trade-offs.