cs231n

http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture6.pdf

sigmoid

  • Saturated neurons “kill” the gradients
  • Sigmoid outputs are not zero-centered
  • exp() is a bit compute expensive

tanh

  • Squashes numbers to range [-1,1]
  • zero centered (nice)
  • Saturated neurons “kill” the gradients

ReLU

  • Does not saturate (in +region)
  • Very computationally efficient
  • Converges much faster than sigmoid/tanh in practice (e.g. 6x)
  • Actually more biologically plausible than sigmoid

http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture7.pdf

  • Adam is a good default choice in most cases
  • If you can afford to do full batch updates then try out L-BFGS (and don’t forget to disable all sources of noise)
©著作权归作者所有,转载或内容合作请联系作者
平台声明:文章内容(如有图片或视频亦包括在内)由作者上传并发布,文章内容仅代表作者本人观点,简书系信息发布平台,仅提供信息存储服务。

推荐阅读更多精彩内容