基于tensorflow的正则化实现
前置条件
什么是正则化(regularization)
如果用一句话解释:正则化就是通过增加权重惩罚(penalty)项到损失函数,让网络倾向于学习小一点的权重,从而达到抑制过拟合,增加模型泛化能力的效果。常见的正则化方法有正则化,正则化和Dropout正则化等。关于正则化的原理和作用请参考深度学习常用正则化方法
本文以正则化进行实现,为了完整性,这个给出正则化的公式
式中是原始代价损失函数
是正则化损失函数, 其中是权重因子,为权重
tensorflow依赖函数
在tensorflow 中,计算图(graph)通过集合(collection)来管理包括张量(tensor)、变量(variable)、资源
-
tf.add_to_collection
将资源添加到特定的集合中
-
tf.get_collection
从特定集合中取出对应的资源
-
示例
import tensorflow as tf # step 1 contruct variable v_0 = tf.Variable(tf.constant([1.0, 2.0, 3.0]), name="v_0") v_1 = tf.get_variable(shape=(), name="v_1") # step 2 add variable to collection tf.add_to_collection(name="variable", value=v_0) tf.add_to_collection(name="variable", value=v_1) init_op = tf.group(tf.global_variables_initializer(), tf.local_variables_initializer()) with tf.Session() as sess: sess.run(init_op) # step 3 get variable from collection for var in tf.get_collection(key="variable"): print('{0}: {1}'.format(var.op.name, var.eval()))
-
结果
v_0: [ 1. 2. 3.] v_1: -1.4189265966415405
requirement enviroment
- software: tensorflow==1.14.0
- hardware: GTX 2060
正则化到底做了什么
理论计算
这里假设处理条件:权重 为 , 正则化因子为
根据公式,正则化损失函数如下
代码验证
这里使用了三种方式计算正则化,前两种为tensorflow 接口,其中第一种为底层接口,第二种为更高级的接口,不过在tensorflow 2.0中已经抛弃了;第三种为根据公式自定义实现接口。代码如下
weight_decay = 0.00004 # 正则化权重因子
weight = tf.Variable(initial_value=tf.constant(value=[1.0, 2.0, 3.0])) # 权重
# use tensorflow interface
# method 1
weight_loss_1 = tf.nn.l2_loss(weight) * weight_decay
# method 2
weight_loss_2 = tf.contrib.layers.l2_regularizer(scale=weight_decay)(weight)
# cunstom
# method 3
custom_weight_loss = tf.reduce_sum(tf.multiply(weight, weight))
custom_weight_loss = 1 / 2 * weight_decay * custom_weight_loss
init_op = tf.group(tf.global_variables_initializer(),
tf.local_variables_initializer())
config = tf.ConfigProto()
config.gpu_options.allow_growth = True
sess = tf.Session(config=config)
sess.run(init_op)
结果
method 1 result: 0.00028
method 2 result: 0.00028
method 3 result: 0.00028
可以看出运行结果于理论推导一致,接下来就是如何在网络中加入正则化,并完成训练。
在网络中引入L2正则化
上述内容已经介绍了的基本概念和使用,接下来将介绍如何在神经网络的构建中引入正则化。正则化在神经网络中的使用主要包括三个步骤:
计算权重的损失并添加到集合(collection)中
分别取出集合中所有权重的损失值并相加
正则化损失函数与原始代价损失函数相加得到总的损失函数
第一步:三种方式收集权重损失函数
-
使用f.nn.l2_loss()接口 与自定义collection 接口
def get_weights_1(shape, weight_decay=0.0, dtype=tf.float32, trainable=True): """ add weight regularization to loss collection Args: shape: weight_decay: dtype: trainable: Returns: """ weight = tf.Variable(initial_value=tf.truncated_normal(shape=shape, stddev=0.01), name='Weights', dtype=dtype, trainable=trainable) if weight_decay > 0: weight_loss = tf.nn.l2_loss(weight) * weight_decay # weight_loss = tf.nn.l2_loss(weight, name="weight_loss") # tf.add_to_collection(tf.GraphKeys.LOSSES, value=weight_loss) tf.add_to_collection(tf.GraphKeys.REGULARIZATION_LOSSES, value=weight_loss) else: pass return weight
分为两步:
-
计算正则化损失
weight_loss = tf.nn.l2_loss(weight) * weight_decay
-
将正则化损失添加到特定集合中(这里直接添加到tensorflow内置集合,也可以添加到自定义集合)
tf.add_to_collection(tf.GraphKeys.REGULARIZATION_LOSSES, value=weight_loss)
-
-
使用 tf.contrib.layers.l2_regularizer 与自定义collection 接口
def get_weights_2(shape, weight_decay=0.0, dtype=tf.float32, trainable=True): """ Args: shape: weight_decay: dtype: trainable: Returns: """ weight = tf.Variable(initial_value=tf.truncated_normal(shape=shape, stddev=0.01), name='Weights', dtype=dtype, trainable=trainable) if weight_decay > 0: weight_loss = tf.contrib.layers.l2_regularizer(weight_decay)(weight) # weight_loss = tf.nn.l2_loss(weight, name="weight_loss") # tf.add_to_collection("weight_loss", value=weight_loss) tf.add_to_collection(tf.GraphKeys.REGULARIZATION_LOSSES, value=weight_loss) else: pass return weight
分为两步:
-
计算正则化损失
weight_loss = tf.contrib.layers.l2_regularizer(weight_decay)(weight)
-
将正则化损失添加到特定集合中(这里直接添加到tensorflow内置集合,也可以添加到自定义集合)
tf.add_to_collection(tf.GraphKeys.REGULARIZATION_LOSSES, value=weight_loss)
-
-
使用tf.contrib.layers.l2_regularizer 与 tf.get_variable接口
def get_weights_3(shape, weight_decay=0.0, dtype=tf.float32, trainable=True): """ add weight to tf.get_variable Args: shape: weight_decay: dtype: trainable: Returns: """ # create regularizer if weight_decay > 0: regularizer= tf.contrib.layers.l2_regularizer(weight_decay) else: regularizer = None weight = tf.get_variable(name='Weights', shape=shape, dtype=dtype, regularizer=regularizer, trainable=trainable) return weight
分为两步:
-
生成正则化器
regularizer= tf.contrib.layers.l2_regularizer(weight_decay)
-
将正则化器参数传入tf.get_variable,tf.get_variable 会内置计算正则化损失函数,并添加到tf.GraphKeys.REGULARIZATION_LOSSES 集合中
weight = tf.get_variable(name='Weights', shape=shape, dtype=dtype, regularizer=regularizer, trainable=trainable)
-
第二步:从集合中获取权重损失
有两种方法可以获取集合中的权重损失函数:
-
通过tf.get_collection()接口,支持所有的集合遍历,包括内置集合和自定义集合
weight_loss_op = tf.get_collection(tf.GraphKeys.REGULARIZATION_LOSSES) weight_loss_op = tf.add_n(weight_loss_op)
-
通过tf.losses.get_regularization_losses()接口,只支持正则化损失收集到REGULARIZATION_LOSSES内置集合的情况
weight_loss_op = tf.losses.get_regularization_losses() weight_loss_op = tf.add_n(weight_loss_op)
两种方法都执行两步:
- 从特定集合中获取收集的全部权重损失
- 使用tf.add_n()接口,遍历并相加所有权重损失项,并返回权重损失之和
第三步:获取总的损失函数
with tf.variable_scope("loss"):
cross_entropy = tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=input_label_placeholder,name='entropy')
loss_op = tf.reduce_mean(input_tensor=cross_entropy, name='loss')
weight_loss_op = tf.losses.get_regularization_losses()
weight_loss_op = tf.add_n(weight_loss_op)
total_loss_op = loss_op + weight_loss_op
完整代码示例
下面构建一个完整的包含三层全连接层的网络模型,完成一次迭代训练
完整代码
# @ File : tf_regularization.py
# @ Description: realize regularization base tensorflow
# @ Author : Alex Chung
# @ Contact : yonganzhong@outlook.com
import tensorflow as tf
#+++++++++++++++++++++++++construct
def fully_connected(input_op, scope, num_outputs, weight_decay=0.00004, is_activation=True, fineturn=True):
"""
full connect layer
Args:
input_op:
scope:
num_outputs:
weight_decay:
is_activation:
fineturn:
Returns:
"""
# get feature num
shape = input_op.get_shape().as_list()
if len(shape) == 4:
size = shape[-1] * shape[-2] * shape[-3]
else:
size = shape[1]
with tf.compat.v1.variable_scope(scope):
flat_data = tf.reshape(tensor=input_op, shape=[-1, size], name='Flatten')
weights =get_weights_1(shape=[size, num_outputs], weight_decay=weight_decay, trainable=fineturn)
# weights = get_weights_2(shape=[size, num_outputs], weight_decay=weight_decay, trainable=fineturn)
# weights = get_weights_3(shape=[size, num_outputs], weight_decay=weight_decay, trainable=fineturn)
biases =get_bias(shape=[num_outputs], trainable=fineturn)
if is_activation:
return tf.nn.relu_layer(x=flat_data, weights=weights, biases=biases)
else:
return tf.nn.bias_add(value=tf.matmul(flat_data, weights), bias=biases)
def get_bias(shape, trainable=True):
"""
get bias
Args:
shape:
trainable:
Returns:
"""
bias = tf.get_variable(shape=shape, name='Bias', dtype=tf.float32, trainable=trainable)
return bias
def get_weights_1(shape, weight_decay=0.0, dtype=tf.float32, trainable=True):
"""
add weight regularization to loss collection
Args:
shape:
weight_decay:
dtype:
trainable:
Returns:
"""
weight = tf.Variable(initial_value=tf.truncated_normal(shape=shape, stddev=0.01), name='Weights', dtype=dtype,
trainable=trainable)
if weight_decay > 0:
weight_loss = tf.nn.l2_loss(weight) * weight_decay
# weight_loss = tf.nn.l2_loss(weight, name="weight_loss")
# tf.add_to_collection(tf.GraphKeys.LOSSES, value=weight_loss)
tf.add_to_collection(tf.GraphKeys.REGULARIZATION_LOSSES, value=weight_loss)
else:
pass
return weight
def get_weights_2(shape, weight_decay=0.0, dtype=tf.float32, trainable=True):
"""
Args:
shape:
weight_decay:
dtype:
trainable:
Returns:
"""
weight = tf.Variable(initial_value=tf.truncated_normal(shape=shape, stddev=0.01), name='Weights', dtype=dtype,
trainable=trainable)
if weight_decay > 0:
weight_loss = tf.contrib.layers.l2_regularizer(weight_decay)(weight)
# weight_loss = tf.nn.l2_loss(weight, name="weight_loss")
# tf.add_to_collection("weight_loss", value=weight_loss)
tf.add_to_collection(tf.GraphKeys.REGULARIZATION_LOSSES, value=weight_loss)
else:
pass
return weight
def get_weights_3(shape, weight_decay=0.0, dtype=tf.float32, trainable=True):
"""
add weight to tf.get_variable
Args:
shape:
weight_decay:
dtype:
trainable:
Returns:
"""
# create regularizer
if weight_decay > 0:
regularizer= tf.contrib.layers.l2_regularizer(weight_decay)
else:
regularizer = None
weight = tf.get_variable(name='Weights', shape=shape, dtype=dtype, regularizer=regularizer,
trainable=trainable)
return weight
#+++++++++++++++++++++++++++++++++consruct network+++++++++++++++++++++++++++++++++++
def model_nets(input_batch, num_classes=None, weight_decay=0.00004, scope="test_nets"):
"""
full connect network
Args:
input_batch:
num_classes:
weight_decay:
scope:
Returns:
"""
with tf.variable_scope(scope):
net = fully_connected(input_batch, num_outputs=128, weight_decay=weight_decay, scope='fc1')
net = fully_connected(net, num_outputs=32, weight_decay=weight_decay, scope='fc2')
net = fully_connected(net, num_outputs=num_classes, is_activation=False, weight_decay=weight_decay,
scope='logits')
prob = tf.nn.softmax(net, name='prob')
return prob
#++++++++++++++++++++++++++++++++execute trarin+++++++++++++++++++++++++++++++++
def main():
# parameter config
BATCH_SIZE = 10
DATA_LENGTH = 1024
NUM_CLASSES = 5
LEARNING_RATE = 0.001
WEIGHT_DECAY = 0.00004
# inference part
global_step = tf.train.get_or_create_global_step()
input_data_placeholder = tf.placeholder(dtype=tf.float32, shape=[None, DATA_LENGTH], name="input_data")
input_label_placeholder = tf.placeholder(dtype=tf.float32, shape=[None, NUM_CLASSES], name="input_label")
# inference part
logits = model_nets(input_batch=input_data_placeholder, num_classes=NUM_CLASSES, weight_decay=WEIGHT_DECAY)
# calculate loss part
with tf.variable_scope("loss"):
cross_entropy = tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=input_label_placeholder,
name='entropy')
loss_op = tf.reduce_mean(input_tensor=cross_entropy, name='loss')
weight_loss_op = tf.losses.get_regularization_losses()
weight_loss_op = tf.add_n(weight_loss_op)
total_loss_op = loss_op + weight_loss_op
# generate data and label
tf.random.set_random_seed(0)
data_batch = tf.Variable(tf.random_uniform(shape=(BATCH_SIZE, DATA_LENGTH), minval=0, maxval=1, dtype=tf.float32))
label_batch = tf.Variable(tf.random_uniform(shape=(BATCH_SIZE,), minval=1, maxval=NUM_CLASSES, dtype=tf.int32))
label_batch = tf.one_hot(label_batch, depth=NUM_CLASSES) # convert label to onehot
# initial variable and graph
init_op = tf.group(tf.global_variables_initializer(),
tf.local_variables_initializer())
config = tf.ConfigProto()
config.gpu_options.allow_growth = True
with tf.Session(config=config) as sess:
sess.run(init_op)
input_data, input_label = sess.run([data_batch, label_batch])
print('regularization loss op:')
for var in tf.get_collection(tf.GraphKeys.REGULARIZATION_LOSSES):
print(var.op.name, var.shape)
# training part
train_op = tf.train.GradientDescentOptimizer(learning_rate=LEARNING_RATE).minimize(loss=total_loss_op,
global_step=global_step)
feed_dict = {input_data_placeholder:input_data,
input_label_placeholder:input_label}
_, total_loss, loss, weight_loss = sess.run([train_op, total_loss_op, loss_op, weight_loss_op],
feed_dict=feed_dict)
print('loss:{0} weight_loss:{1} total_loss:{2}'.format(loss, weight_loss, total_loss))
if __name__ == "__main__":
main()
执行结果
regularization loss op:
test_nets/fc1/mul ()
test_nets/fc2/mul ()
test_nets/logits/mul ()
loss:1.649293303489685 weight_loss:0.0002091786591336131 total_loss:1.6495025157928467