几种梯度下降数学推导和做图(动态)英文原文
https://ruder.io/optimizing-gradient-descent/index.html
六种特度下降方法和代码实现
https://zhuanlan.zhihu.com/p/158813090#:~:text=%E4%BB%A5%E9%80%BB%E8%BE%91%E5%9B%9E%E5%BD%92%E4%B8%BA%E4%BE%8B%E4%BB%8B,dam%E5%8F%8A%E5%85%B6%E5%AE%9E%E7%8E%B0.
梯度下降方法和对比(SGD & BGD)
https://blog.csdn.net/u012328159/article/details/80252012
详细解释几种常见梯度下降优化算法:
https://blog.csdn.net/huwenxing0801/article/details/85627245