On the Expressive Power of Deep Learning: A Tensor Analysis
Nadav Cohen,Or Sharir,Amnon Shashua
把样本空间上的分类函数用高维实函数空间的基的乘积来表示,用张量分解的方法,文章证明了几乎所有深度网络(例如HT模型)生成的函数,都无法被浅层网络(例如CP模型)高效率的近似表示。
(Submitted on 16 Sep 2015)
It has long been conjectured that hypothesis spaces suitable for data that is compositional in nature, such as text or images, may be more efficiently represented with deep hierarchical architectures than with shallow ones. Despite the vast empirical evidence, formal arguments to date are limited and do not capture the kind of networks used in practice. Using tensor factorization, we derive a universal hypothesis space implemented by an arithmetic circuit over functions applied to local data structures (e.g. image patches). The resulting networks first pass the input through a representation layer, and then proceed with a sequence of layers comprising sum followed by product-pooling, where sum corresponds to the widely used convolution operator. The hierarchical structure of networks is born from factorizations of tensors based on the linear weights of the arithmetic circuits. We show that a shallow network corresponds to a rank-1 decomposition, whereas a deep network corresponds to a Hierarchical Tucker (HT) decomposition. Log-space computation for numerical stability transforms the networks into SimNets.
In its basic form, our main theoretical result shows that the set of polynomially sized rank-1 decomposable tensors has measure zero in the parameter space of polynomially sized HT decomposable tensors. In deep learning terminology, this amounts to saying that besides a negligible set, all functions that can be implemented by a deep network of polynomial size, require an exponential size if one wishes to implement (or approximate) them with a shallow network. Our construction and theory shed new light on various practices and ideas employed by the deep learning community, and in that sense bear a paradigmatic contribution as well.
Subjects:Neural and Evolutionary Computing (cs.NE); Learning (cs.LG); Numerical Analysis (cs.NA)
Cite as:arXiv:1509.05009[cs.NE]
(orarXiv:1509.05009v1[cs.NE]for this version)
Download: