深度学习的梯度消失/爆炸

考虑一个L层的网络,前向计算时,对于第l层:
z_l = w_la_{l-1} + b_l
a_l = \delta (z_l)
反向计算时,对于第l层:
dz_l = w_{l+1}dz_{l+1} * \delta'(z_l)
dw_l = a_{l-1}dz_l
db_l = dz_l

relu activation

a_{l-1} \approx w_1w_2...w_{l-1}x
dz_l \approx w_Lw_{L-1}...w_{l+1}dz_L
dw_l \approx w_lw_2...w_{l-1}w_{l+1}...w_Lxdz_L
db_l \approx w_{l-1}...w_Ldz_L
|w_j|<1,随着L的增加,越靠近输入层,dz呈指数级减小,同时db也呈指数级减小;每一层dw的大小没有指数级的变化,但是随着L的增加,每一层的dw都呈指数级减小。

sigmoid activation

a_{l-1} = \delta(w_{l-1} * ...\delta(w_2 * \delta(w_1x + b_1) + b_2) + b_{l-1})
dz_l = w_L\delta'(z_L)w_{L-1}\delta'(z_{L-1})...w_{l+1}\delta'(z_{l+1})dz_L
dw_l = a_{l-1}dz_l
db_l = dz_l
a_{l-1}在0到1之间,可以忽略不计,若|w_j\delta'(z_j)|<1,随着L的增加,越靠近输入层,dz呈指数级减小,同时dw和db也呈指数级减小。

relu demo

假设一个10层的神经网络,每层2个节点,除了输出层是sigmoid,其他层的activator是relu

import numpy as np
from numpy.linalg import cholesky
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D

def logistic(z):
    return 1 / (1 + np.exp(-z))

def logistic_gradient(z):
    a = logistic(z)
    return a * (1 - a)

def relu(z):
    return z * (z > 0)

def relu_gradient(z):
    return np.ones(z.shape) * (z > 0)

sampleNo = 10
mu = np.array([[2, 3]])
Sigma = np.array([[1, 0.5], [1.5, 3]])
R = cholesky(Sigma)
s = np.dot(np.random.randn(sampleNo, 2), R) + mu

mu2 = np.array([[7, 6]])
t = np.dot(np.random.randn(sampleNo, 2), R) + mu2

plt.plot(s[:,0],s[:,1],'+')
plt.plot(t[:,0],t[:,1],'*')
plt.xlabel("x")
plt.ylabel("y")
plt.show()

#构造数据
x = np.concatenate((s, t)).T
y1 = np.zeros(sampleNo).reshape(1,sampleNo)
y2 = np.ones(sampleNo).reshape(1,sampleNo)
y = np.concatenate((y1, y2), axis=1)
print(x.shape, y.shape)

#初始化网络
layer = 10
#考虑一个比1稍小的w
w = {}
for i in range(1, layer):
    w_tmp = np.array([0.8, 0, 0, 0.8]).reshape(2,2)
    w[i] = w_tmp
w_out = np.ones(2).reshape(2,1)
w[layer] = w_out
#b初始化为0
b = {}
for i in range(1, layer):
    b_tmp = np.zeros(2).reshape(1,2)
    b[i] = b_tmp
b_out = np.zeros(1).reshape(1,1)
b[layer] = b_out
#activator初始化
act = {}
for i in range(1, layer):
    act[i] = relu
act[layer] = logistic

iter = 1
max_iter = 2
m = sampleNo * 2
alpha = 0.01
while iter < max_iter:
    #前向计算
    a = x
    a_dict = {}
    z_dict = {}
    a_dict[0] = a
    for i in range(1, layer+1):
        z = np.dot(w[i].T, a) + b[i].T
        a = act[i](z)
        a_dict[i] = a
        z_dict[i] = z
    #反向计算
    dz = {}
    dw = {}
    db = {}  
    for i in range(layer, 0, -1):
        if i == layer:
            dz[i] = a_dict[i] - y
        else:
            dz[i] = np.dot(w[i+1], dz[i+1]) * relu_gradient(z_dict[i])
        dw[i] = np.dot(a_dict[i - 1], dz[i].T) / m
        db[i] = np.sum(dz[i].T, axis = 0) / m
    #更新参数
    for i in range(1, layer+1):
        w[i] = w[i] - alpha * dw[i]
        b[i] = b[i] - alpha * db[i]

    iter += 1

print("反向计算")
for i in range(layer, 0, -1):
    print("第%d层" % i)
    print("dz", dz[i][:,0])
    print("dw", dw[i])
    print("db", db[i])

反向计算
第10层
dz [0.6525866]
dw [[0.00245394]
[0.07067444]]
db [0.25647431]
第9层
dz [0.6525866 0.6525866]
dw [[0.00306743 0.00306743]
[0.08834304 0.08834304]]
db [0.25647431 0.25647431]
第8层
dz [0.52206928 0.52206928]
dw [[0.00306743 0.00306743]
[0.08834304 0.08834304]]
db [0.20517945 0.20517945]
第7层
dz [0.41765542 0.41765542]
dw [[0.00306743 0.00306743]
[0.08834304 0.08834304]]
db [0.16414356 0.16414356]
第6层
dz [0.33412434 0.33412434]
dw [[0.00306743 0.00306743]
[0.08834304 0.08834304]]
db [0.13131485 0.13131485]
第5层
dz [0.26729947 0.26729947]
dw [[0.00306743 0.00306743]
[0.08834304 0.08834304]]
db [0.10505188 0.10505188]
第4层
dz [0.21383958 0.21383958]
dw [[0.00306743 0.00306743]
[0.08834304 0.08834304]]
db [0.0840415 0.0840415]
第3层
dz [0.17107166 0.17107166]
dw [[0.00306743 0.00306743]
[0.08834304 0.08834304]]
db [0.0672332 0.0672332]
第2层
dz [0.13685733 0.13685733]
dw [[0.00306743 0.00306743]
[0.08834304 0.08834304]]
db [0.05378656 0.05378656]
第1层
dz [0.10948586 0.10948586]
dw [[0.00306743 0.00306743]
[0.08834304 0.08834304]]
db [0.04302925 0.04302925]

sigmoid demo

假设一个10层的神经网络,每层2个节点,所有层都是sigmoid

#初始化网络
layer = 10
#考虑一个比1稍小的w
w = {}
for i in range(1, layer):
    w_tmp = np.array([0.8, 0, 0, 0.8]).reshape(2,2)
    w[i] = w_tmp
w_out = np.ones(2).reshape(2,1)
w[layer] = w_out
#b初始化为0
b = {}
for i in range(1, layer):
    b_tmp = np.zeros(2).reshape(1,2)
    b[i] = b_tmp
b_out = np.zeros(1).reshape(1,1)
b[layer] = b_out
#activator初始化
act = {}
for i in range(1, layer+1):
    act[i] = logistic

iter = 1
max_iter = 2
m = sampleNo * 2
alpha = 0.01
while iter < max_iter:
    #前向计算
    a = x
    a_dict = {}
    z_dict = {}
    a_dict[0] = a
    for i in range(1, layer+1):
        z = np.dot(w[i].T, a) + b[i].T
        a = act[i](z)
        a_dict[i] = a
        z_dict[i] = z
    #反向计算
    dz = {}
    dw = {}
    db = {}  
    for i in range(layer, 0, -1):
        if i == layer:
            dz[i] = a_dict[i] - y
        else:
            dz[i] = np.dot(w[i+1], dz[i+1]) * relu_gradient(z_dict[i])
        dw[i] = np.dot(a_dict[i - 1], dz[i].T) / m
        db[i] = np.sum(dz[i].T, axis = 0) / m
    #更新参数
    for i in range(1, layer+1):
        w[i] = w[i] - alpha * dw[i]
        b[i] = b[i] - alpha * db[i]

    iter += 1

print("反向计算")
for i in range(layer, 0, -1):
    print("第%d层" % i)
    print("dz", dz[i][:,0])
    print("dw", dw[i])
    print("db", db[i])

反向计算
第10层
dz [0.77621477]
dw [[0.17176994]
[0.17177003]]
db [0.27621479]
第9层
dz [0.77621477 0.77621477]
dw [[0.17176998 0.17176998]
[0.17177045 0.17177045]]
db [0.27621479 0.27621479]
第8层
dz [0.62097182 0.62097182]
dw [[0.13741614 0.13741614]
[0.13741813 0.13741813]]
db [0.22097183 0.22097183]
第7层
dz [0.49677746 0.49677746]
dw [[0.10993356 0.10993356]
[0.10994206 0.10994206]]
db [0.17677747 0.17677747]
第6层
dz [0.39742196 0.39742196]
dw [[0.08794963 0.08794963]
[0.08798579 0.08798579]]
db [0.14142197 0.14142197]
第5层
dz [0.31793757 0.31793757]
dw [[0.07037154 0.07037154]
[0.07052532 0.07052532]]
db [0.11313758 0.11313758]
第4层
dz [0.25435006 0.25435006]
dw [[0.05634747 0.05634747]
[0.05700205 0.05700205]]
db [0.09051006 0.09051006]
第3层
dz [0.20348005 0.20348005]
dw [[0.04528946 0.04528946]
[0.04808773 0.04808773]]
db [0.07240805 0.07240805]
第2层
dz [0.16278404 0.16278404]
dw [[0.0370514 0.0370514 ]
[0.04934335 0.04934335]]
db [0.05792644 0.05792644]
第1层
dz [0.13022723 0.13022723]
dw [[-0.05082364 -0.05082364]
[ 0.06565071 0.06565071]]
db [0.04634115 0.04634115]

二类分类

import numpy as np
from numpy.linalg import cholesky
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D

sampleNo = 1000
mu = np.array([[2, 3]])
Sigma = np.array([[1, 0.5], [1.5, 3]])
R = cholesky(Sigma)
s = np.dot(np.random.randn(sampleNo, 2), R) + mu

mu2 = np.array([[17, 10]])
t = np.dot(np.random.randn(sampleNo, 2), R) + mu2

plt.plot(s[:,0],s[:,1],'+')
plt.plot(t[:,0],t[:,1],'*')
plt.xlabel("x")
plt.ylabel("y")
plt.show()
image.png
#构造数据
x = np.concatenate((s, t)).T
y1 = np.zeros(sampleNo).reshape(1,sampleNo)
y2 = np.ones(sampleNo).reshape(1,sampleNo)
y = np.concatenate((y1, y2), axis=1)

def logistic(z):
    return 1 / (1 + np.exp(-z))

def logistic_gradient(z):
    a = logistic(z)
    return a * (1 - a)

def relu(z):
    return z * (z > 0)

def relu_gradient(z):
    return np.ones(z.shape) * (z > 0)

#初始化网络
#考虑一个比1稍大的w
w = {}
for i in range(1, 10):
    w_tmp = np.array([1.1, 0, 0, 1.1]).reshape(2,2)
    w[i] = w_tmp
w_out = np.ones(2).reshape(2,1)
w[10] = w_out
#b初始化为0
b = {}
for i in range(1, 10):
    b_tmp = np.zeros(2).reshape(1,2)
    b[i] = b_tmp
b_out = np.zeros(1).reshape(1,1)
b[10] = b_out
#activator初始化
act = {}
for i in range(1, 10):
    act[i] = relu
act[10] = logistic

#训练
iter = 1
max_iter = 50000
m = sampleNo * 2
alpha = 0.01
while iter < max_iter:
    #前向计算
    a = x
    a_dict = {}
    z_dict = {}
    a_dict[0] = a
    for i in range(1, 11):
        z = np.dot(w[i].T, a) + b[i].T
        a = act[i](z)
        a_dict[i] = a
        z_dict[i] = z
    #反向计算
    dz = {}
    dw = {}
    db = {}  
    for i in range(10, 0, -1):
        if i == 10:
            dz[i] = a_dict[i] - y
        else:
            dz[i] = np.dot(w[i+1], dz[i+1]) * relu_gradient(z_dict[i])
        dw[i] = np.dot(a_dict[i - 1], dz[i].T) / m
        db[i] = np.sum(dz[i].T, axis = 0) / m
    #更新参数
    for i in range(1, 11):
        w[i] = w[i] - alpha * dw[i]
        b[i] = b[i] - alpha * db[i]

    iter += 1

#显示结果
fig = plt.figure()
ax = fig.gca(projection='3d')
ax.scatter(s[:,0], s[:,1], y[y==0])
ax.scatter(t[:,0], t[:,1], y[y==1])

def predict(x):
    a = x
    for i in range(1, 11):
        z = np.dot(w[i].T, a) + b[i].T
        a = act[i](z)
    return a

x1_tmp = x2_tmp = np.linspace(-10, 30, 100)
x1_tmp, x2_tmp = np.meshgrid(x1_tmp, x2_tmp)
x_tmp = np.concatenate((x1_tmp.reshape(1, 10000), x2_tmp.reshape(1, 10000)))
y_tmp = predict(x_tmp)
y_tmp = y_tmp.reshape(100, 100)
ax.plot_surface(x1_tmp, x2_tmp, y_tmp)
ax.view_init(elev=30,azim=-120)
ax.set_xlabel('x')
ax.set_ylabel('y')
plt.show()
image.png

对比只有一层的logistic regression模型,使用相同超参的训练结果如下:


image.png
最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 219,701评论 6 508
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 93,649评论 3 396
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 166,037评论 0 356
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 58,994评论 1 295
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 68,018评论 6 395
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 51,796评论 1 308
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 40,481评论 3 420
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 39,370评论 0 276
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 45,868评论 1 319
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 38,014评论 3 338
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 40,153评论 1 352
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 35,832评论 5 346
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 41,494评论 3 331
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 32,039评论 0 22
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 33,156评论 1 272
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 48,437评论 3 373
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 45,131评论 2 356

推荐阅读更多精彩内容