Deep Residual Learning for Image Recognition

ResNet
简述
论文主要解决的是深层网络的退化问题(degradation problem——简单地堆叠网络使其变深,准确率反而下降。为了解决这个问题,作者设计了残差网络。不是直接去找输入与输出之间的非线性映射,而是学习输入到残差(输出减去输入)之间的非线性映射关系。
理论上讲,堆叠相同的网络使得网络变深,其结果不一定更好,但不应该更差。为了解决这个问题呢,直观地讲,如果添加一个从底层网络直接到上层网络的分支(identity mapping),其更深的网络的结果一定不会比跟它相同结构的浅层的网络结果差。这个分支呢,称为shortcut。
后期,通过实验发现那个shortcut的作用:
第一:提供identity mapping的功能,保证增加层数之后至少不会比原结果差
第二:provide a resonable preconditioning,因为经过后面的实验得出,残差方程参数变化不大,也就说明,short cut 分支提供了一个很好的precondition。ReNet在层数不那么深的时候(比如18层),跟plain network结果差不多,但是其收敛更快一些,也是得益于shortcut。
论文链接:https://arxiv.org/pdf/1512.03385.pdf
Hightlight of the paper:
- an “inception” layer is composed of a shortcut branch and a few deeper branches
- the added layers can be constructed as identity mappings, a deeper model should have training error no greater than its shallower counterpart. The degradation problem suggests that the solvers might have difficulties in approximating identity mappings by multiple nonlinear layers.
- In real cases, it is unlikely that identity mappings are optimal, but our reformulation may help to precondition the problem. If the optimal function is closer to an identity mapping than to a zero mapping, it should be easier for the solver to find the perturbations with reference to an identity mapping, than to learn the function as a new one. We show by experiments that the learned residual functions in general have small responses, suggesting that identity mappings provide reasonable preconditioning