收敛的难度
梯度下降法的神经网络容易收敛到局部最优,为什么应用广泛? - 知乎
炼丹的世界最悲剧的莫过于网络收敛到了大高原上 /(ㄒoㄒ)/~~
不收敛的可能原因
英文原文: My Neural Network isn't working! What should I do?
1、没有对数据进行归一化
2、忘记检查输入和输出
3、没有对数据进行预处理
4、没有对数据正则化
5、使用过大的样本
6、使用不正确的学习率
7、在输出层使用错误的激活函数
8、网络中包含坏梯度
9、初始化权重错误
10、过深的网络
11、隐藏单元数量错误
个人实践经验补充:
12、优化算法不对,一般用adam居多。
13、数据随机性太强,同时网络结构太复杂(样本空间太大,同时样本数量不够,网络函数空间也大)
14、学习率过大。网络可能快要收敛了,却忽然在下一个epoch走向了错误的方向,最终导致不收敛。
15、脏数据,比如数据中有nan,这个可能更容易发生。网上其它人的示例:
都可以归结为数据、网络、优化算法(算法、学习率)三方面。
排查原因:依次从 代码逻辑、数据、模型、参数排查。
神经网络不学习的可能原因
示例
1、学习率过大导致不收敛
如图所示(左边时训练图片,中间为标注,右侧为预测结果。三张图片分别是epoch3、epoch4、epoch5的结果):
```
2018-10-09 11:33:49,456 Epoch 3, Average loss: 0.6308, learning rate: 0.0010
2018-10-09 11:33:49,592 Verification error= 30.3%, loss= 0.6245
2018-10-09 11:33:51,573 Iter 80, Minibatch Loss= 0.6889, Training Accuracy= 0.5185, Minibatch error= 48.2%
2018-10-09 11:33:52,316 Iter 82, Minibatch Loss= 0.6997, Training Accuracy= 0.4691, Minibatch error= 53.1%
2018-10-09 11:33:53,057 Iter 84, Minibatch Loss= 0.6948, Training Accuracy= 0.4854, Minibatch error= 51.5%
2018-10-09 11:33:53,824 Iter 86, Minibatch Loss= 0.6720, Training Accuracy= 0.5637, Minibatch error= 43.6%
2018-10-09 11:33:54,558 Iter 88, Minibatch Loss= 0.6576, Training Accuracy= 0.6208, Minibatch error= 37.9%
2018-10-09 11:33:55,275 Iter 90, Minibatch Loss= 0.6773, Training Accuracy= 0.6436, Minibatch error= 35.6%
2018-10-09 11:33:56,019 Iter 92, Minibatch Loss= 0.6449, Training Accuracy= 0.6383, Minibatch error= 36.2%
2018-10-09 11:33:56,733 Iter 94, Minibatch Loss= 0.6382, Training Accuracy= 0.6466, Minibatch error= 35.3%
2018-10-09 11:33:57,415 Iter 96, Minibatch Loss= 0.6264, Training Accuracy= 0.6804, Minibatch error= 32.0%
2018-10-09 11:33:58,146 Iter 98, Minibatch Loss= 0.6272, Training Accuracy= 0.6811, Minibatch error= 31.9%
2018-10-09 11:33:58,313 Epoch 4, Average loss: 0.6405, learning rate: 0.0010
2018-10-09 11:33:58,443 Verification error= 24.0%, loss= 0.5926
2018-10-09 11:34:00,532 Iter 100, Minibatch Loss= 0.6068, Training Accuracy= 0.7254, Minibatch error= 27.5%
2018-10-09 11:34:01,303 Iter 102, Minibatch Loss= 0.5935, Training Accuracy= 0.7536, Minibatch error= 24.6%
2018-10-09 11:34:01,973 Iter 104, Minibatch Loss= 0.5758, Training Accuracy= 0.7771, Minibatch error= 22.3%
2018-10-09 11:34:02,677 Iter 106, Minibatch Loss= 0.5242, Training Accuracy= 0.8245, Minibatch error= 17.6%
2018-10-09 11:34:03,417 Iter 108, Minibatch Loss= 0.4082, Training Accuracy= 0.8660, Minibatch error= 13.4%
2018-10-09 11:34:04,067 Iter 110, Minibatch Loss= 0.6739, Training Accuracy= 0.6457, Minibatch error= 35.4%
2018-10-09 11:34:04,711 Iter 112, Minibatch Loss= 0.6728, Training Accuracy= 0.6457, Minibatch error= 35.4%
2018-10-09 11:34:05,375 Iter 114, Minibatch Loss= 0.6538, Training Accuracy= 0.6457, Minibatch error= 35.4%
2018-10-09 11:34:06,033 Iter 116, Minibatch Loss= 0.6632, Training Accuracy= 0.6212, Minibatch error= 37.9%
2018-10-09 11:34:06,668 Iter 118, Minibatch Loss= 0.6562, Training Accuracy= 0.6455, Minibatch error= 35.5%
2018-10-09 11:34:06,837 Epoch 5, Average loss: 0.6308, learning rate: 0.0010
2018-10-09 11:34:06,972 Verification error= 43.2%, loss= 0.6791
2018-10-09 11:34:09,214 Iter 120, Minibatch Loss= 0.6537, Training Accuracy= 0.6457, Minibatch error= 35.4%
2018-10-09 11:34:09,810 Iter 122, Minibatch Loss= 0.6550, Training Accuracy= 0.6457, Minibatch error= 35.4%
2018-10-09 11:34:10,500 Iter 124, Minibatch Loss= 0.6564, Training Accuracy= 0.6457, Minibatch error= 35.4%
2018-10-09 11:34:11,159 Iter 126, Minibatch Loss= 0.6584, Training Accuracy= 0.6457, Minibatch error= 35.4%
2018-10-09 11:34:11,823 Iter 128, Minibatch Loss= 0.6588, Training Accuracy= 0.6457, Minibatch error= 35.4%
2018-10-09 11:34:12,476 Iter 130, Minibatch Loss= 0.6561, Training Accuracy= 0.6457, Minibatch error= 35.4%
2018-10-09 11:34:13,102 Iter 132, Minibatch Loss= 0.6537, Training Accuracy= 0.6457, Minibatch error= 35.4%
2018-10-09 11:34:13,729 Iter 134, Minibatch Loss= 0.6520, Training Accuracy= 0.6457, Minibatch error= 35.4%
2018-10-09 11:34:14,319 Iter 136, Minibatch Loss= 0.6510, Training Accuracy= 0.6457, Minibatch error= 35.4%
2018-10-09 11:34:14,971 Iter 138, Minibatch Loss= 0.6509, Training Accuracy= 0.6457, Minibatch error= 35.4%
```
可以看出在epoch3已经快要收敛了,但在epoch4和epoch5却走偏了,不改变学习率继续训练下去还有可能重复这个过程。调小学习率后可以逐渐收敛。