Reinforcement learning (RL) makes it possible to train agents capable of achieving sophisticated goals in complex and uncertain environments. A key difficulty in reinforcement learning is specifying a reward function for the agent to optimize.
Humans overcome this difficulty by imitating other humans. Imitation learning in RL works well whenever the demonstrations are given in the first person: the agent is provided with a sequence of states and a specification of the actions that it should have taken. While powerful, this kind of imitation learning is limited by the relatively hard problem of collecting first person demonstrations.
Humans address this problem by learning from third person demonstrations: they observe other humans perform tasks, infer the task, and accomplish the same task themselves.
In this paper, we present a method for unsupervised third person imitation learning.
By using ideas from domain confusion, we are able to train an agent to correctly achieve a simple goal in a simple environment when it is provided a demonstration of a teacher achieving the same goal but from a different viewpoint. Crucially, the agent receives only these demonstrations, and is not provided a correspondence between teacher states and student states.