SYMMETRY-BREAKING CONVERGENCE ANALYSIS OF CERTAIN TWO-LAYERED NEURAL NETWORKS WITH RELU NONLINEARITY

Yuandong Tian
Facebook AI Research

http://openreview.net/pdf?id=Hk85q85ee

In this paper, we use dynamical system to analyze the nonlinear weight dynamics of two-layered bias-free networks in the form of

, where σ(·) is ReLU nonlinearity. We assume that the input x follow Gaussian distribution. The network is trained using gradient descent to mimic the output of a teacher network of the same size with fixed parameters w using l2 loss.

  1. We first show that when K = 1, the nonlinear dynamics can be written in close form, and converges to w with at least (1 − � ɛ�)/2 probability, if random weight initializations of proper standard derivation (∼ 1/√d) is used, verifying empirical practice [Glorot & Bengio (2010); He et al. (2015); LeCun et al. (2012)].

  2. For networks with many ReLU nodes (K ≥ 2), we apply our close form dynamics and prove that when the teacher parameters


forms orthonormal bases, (1) a symmetric weight initialization yields a convergence to a saddle point and (2) a certain symmetry-breaking weight initialization yields global convergence to w without local minima.

To our knowledge, this is the first proof that shows global convergence in nonlinear neural network without unrealistic assumptions on the independence of ReLU activations. In addition, we also give a concise gradient update formulation for a multilayer ReLU network when it follows a teacher of the same size with l2 loss. Simulations verify our theoretical analysis.

最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
平台声明:文章内容(如有图片或视频亦包括在内)由作者上传并发布,文章内容仅代表作者本人观点,简书系信息发布平台,仅提供信息存储服务。

推荐阅读更多精彩内容

  • 我的宝贝: 妈妈想和你检讨一下自己,今天在单位,两个同事和妈妈探讨育儿经,当他们说起自己三四岁的孩子时,脸上...
    晓寒iyoyo阅读 119评论 2 1
  • 本文参加#感悟三下乡,青春筑梦行#活动,本人承诺,文章内容为原创,且未在其他平台发表过 作为师范专业的学子,我深知...
    唐晨晨晨阅读 205评论 0 0
  • 前段日子新闻报道,武大一名毕业生,打着“带着回忆闯世界,2013年的武大樱花只要30-300元!带不走黄灿灿,带走...
    默夜清辉阅读 526评论 2 11