先copy一段代码
import tensorflow as tf
global_step = tf.Variable(0, trainable=False)
initial_learning_rate = 0.1 #初始学习率
learning_rate = tf.train.exponential_decay(initial_learning_rate,
global_step,
decay_steps=10,decay_rate=0.9)
opt = tf.train.GradientDescentOptimizer(learning_rate)
add_global = global_step.assign_add(1)
with tf.Session() as sess:
tf.global_variables_initializer().run()
print(sess.run(learning_rate))
for i in range(20):
g, rate = sess.run([add_global, learning_rate])
print(g,rate)
initial_learning_rate :最初的学习率0.1,随着global_step增加而减少
global_step :不可以被训练的参数,全局的论述,与i是一样的,可以被i代替。
decay_rate:衰减率0.9,
decay_steps:10,设定每10学习率就减少0.9
非官方理解
学习率会随着训练论述增加而减少,为什么需要学习率减少?我不会在这里解释:)。
等价于下面这行代码
decayed_learning_rate =
learning_rate * decay_rate ^(global_step / decay_ steps)