VIDEO |
---|
[机器学习入门] 李宏毅机器学习笔记-10 (Tips for Deep Learning;深度学习小贴士)
VIDEO |
---|
Recipe of Deep Learning
在 training data上的performance不好
Deeper usually does not imply better
Vanishing Gradient Problem
ReLU(Rectified Linear Unit)
ReLU - variant
那么除了ReLU有没有别的activation function了呢?
所以我们用 Maxout来根据training data自动生成activation function
ReLU is a special cases of Maxout
Maxout
ReLU is a special cases of Maxout
More than ReLU
Maxout - Training
Adaptive Learning Rate
RMSProp
Hard to find optimal network parameters
Momentum(gradient descent 融入惯性作用)
所以,加了momentum后: