Yann Lecun探讨AI的一段话

lecun大牛前一段时间发表了一段关于AI的观点:most of human and animal learning is unsupervised learning. If intelligence was a cake, unsupervised learning would be the cake, supervised learning would be the icing on the cake, and reinforcement learning would be the cherry on the cake. We know how to make the icing and the cherry, but we don't know how to make the cake. We need to solve the unsupervised learning problem before we can even think of getting to true AI. And that's just an obstacle we know about. What about all the ones we don't know about?

这段话对很多reinforcement learning的fans有点不舒服。看了下面两段话,还是有点信服的

It is hard to tell if he meant “cherry on the cake” as the Oxford Dictionary defines it: “Adesirable feature perceived as the finishing touch to something that is alreadyvery good”or he was just making a pointthat unsupervised learning is THE cake, everything else is add-on. Now, I can see why Reinforcement Learning would be a great add on to solve intelligence. We can teach computers through supervised learning, we can somewhat let them learn by themselves using (what we have of) unsupervised learning. Solving those two alone would allow us to create super intelligent agents, but we will still have to tell them what to learn, what to solve, and so on. Solving Reinforcement Learning allows us to “unleash” these smart agents to find out their own desires, follow their own dreams, pursue their own happiness. But hey, we are far from it. There is a lot of work ahead.

If we only use the reinforcement signal to guide training, then I agree with Yann LeCun that it is the cherry on the cake. Even worse: when using a global reinforcement signal that is not a known differentiable function of the representations (which is typically the case), there is a serious scaling problem in terms of the number of hidden units (or action dimensions) that can be trained with respect to that signal. The number of examples, random samples or trials of actions may have to grow at least linearly with the number of units in order to provide credit assignment of quality comparable to that obtained with back-propagation. If the action space is large, this is problematic.  However, as Demis Hassabis said when Yann talked about the cake and cherry analogy, we should *also* do unsupervised learning, along with reinforcement learning. Then it becomes more credible that it can work on a large scale.最近看deepmind的一些最新论文,确实有把reinforcement和unserpersived结合的趋势。之前我一直认为reinforcement是能通过reward signal把representation学习出来,现在不再那么确定了。

最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
平台声明:文章内容(如有图片或视频亦包括在内)由作者上传并发布,文章内容仅代表作者本人观点,简书系信息发布平台,仅提供信息存储服务。

推荐阅读更多精彩内容