训练神经网络的典型步骤
- 定义具有一个权重可学习的神经网络,及其损失函数
- 遍历输入数据集
- 前向计算得到输出(forward)
- 通过损失函数计算损失(Loss)
- 通过反向传播法,计算各个神经元的梯度
- 更新网络的权重: weight = weight - learning_rate * gradient
第一步,定义神经网络,并执行前向计算(forward)。用户只需要定义forward函数,backward()方法就可以自动计算梯度。
import torch
import torch.nn as nn
import torch.nn.functional as F
class Net(nn.Module): # 自定义神经网络类必须从nn.Module继承
def __init__(self):
super().__init__() #初始化父类
# 1 input image channel, 6 output channels, 3x3 conv kernel
self.conv1 = nn.Conv2d(1, 6, 3)
self.conv2 = nn.Conv2d(6, 16, 3)
# fully-connected layers
self.fc1 = nn.Linear(16*6*6, 120) # 6x6 image dimension
self.fc2 = nn.Linear(120, 84)
self.fc3 = nn.Linear(84, 10)
def forward(self, x): #必须定义前向计算方法
x = F.max_pool2d(F.relu(self.conv1(x)), 2)
x = F.max_pool2d(F.relu(self.conv2(x)), 2)
x = x.view(-1, self.num_flat_features(x))
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = self.fc3(x)
return x
def num_flat_features(self, x):
size = x.size()[1:]
num_features = 1
for s in size:
num_features *= s
return num_features
net = Net()
print(net)
params = list(net.parameters())
print(len(params))
for param in params:
print(param.size())
input = torch.randn(1, 32, 32) #单个样本 channels x height x width
print(input.size())
input.unsqueeze_(0) # 构造出 mini-batches: nSamples x 单个样本 channels x height x width
print(input.size())
out = net(input)
print(out)
结果:
Net(
(conv1): Conv2d(1, 6, kernel_size=(3, 3), stride=(1, 1))
(conv2): Conv2d(6, 16, kernel_size=(3, 3), stride=(1, 1))
(fc1): Linear(in_features=576, out_features=120, bias=True)
(fc2): Linear(in_features=120, out_features=84, bias=True)
(fc3): Linear(in_features=84, out_features=10, bias=True)
)
10
torch.Size([6, 1, 3, 3])
torch.Size([6]) # bias
torch.Size([16, 6, 3, 3])
torch.Size([16]) # bias
torch.Size([120, 576])
torch.Size([120]) # bias
torch.Size([84, 120])
torch.Size([84]) # bias
torch.Size([10, 84])
torch.Size([10]) # bias
tensor([[-0.0017, -0.1201, 0.0749, -0.0046, -0.1258, -0.0141, -0.1134, 0.0689,
-0.0086, 0.0619]], grad_fn=<AddmmBackward>)
torch.nn仅支持:mini-batches方式; 若是单一样本,也要用语句:
input.unsqueeze_(0) # input格式为:nChannels x Height x Width
构造出mini-batches的格式,即:nSamples x nChannels x Height x Width
第二步:定义损失函数,执行反向传播法。损失函数输入(out,target),并计算一个值,该值估计out与target的距离。nn软件包下有几种不同的 损失函数。一个简单的损失函数是:nn.MSELoss
计算输入和目标之间的均方误差
# 构造出target
target = torch.zeros(10)
target = target.view(1, -1) # The same shape of out
print(target.shape, out.shape)
criterion = nn.MSELoss() # 使用torch库计算Mean Square Error
loss = criterion(out, target)
loss1 = (out**2).mean() #手动计算均方误差
print(loss, loss1)
#反向查看各层的梯度函数
print(loss.grad_fn) # Loss gradient funciton
print(loss.grad_fn.next_functions[0][0]) # Linear gradient funciton
print(loss.grad_fn.next_functions[0][0].next_functions[0][0]) # Relu gradient funciton
#反向传播法,注意梯度是积累的,需要手动将梯度缓冲区设置为零
net.zero_grad() # 手动将梯度缓冲区设置为零
print(net.conv2.bias.grad) # 查看conv2.bias的梯度值
loss.backward() #执行反向传播
print(net.conv2.bias.grad) #查看conv2.bias的梯度值
输出结果:
torch.Size([1, 10]) torch.Size([1, 10])
tensor(0.0061, grad_fn=<MseLossBackward>) tensor(0.0061, grad_fn=<MeanBackward0>)
<MseLossBackward object at 0x0000023F445B7E88>
<AddmmBackward object at 0x0000023F48396A88>
<AccumulateGrad object at 0x0000023F48396A88>
None
tensor([-0.0018, 0.0005, 0.0013, -0.0012, -0.0002, -0.0005, 0.0014, -0.0001,
0.0021, -0.0005, 0.0002, -0.0003, -0.0006, 0.0018, 0.0012, 0.0005])
第三步,也是最后一步,更新网络的权重。获得各个神经元的梯度后,可以用最简单的方式,更新神经网络的参数:
weight = weight - learning_rate * gradient
# 手动实现SGD
learning_rate = 0.01
for f in net.parameters():
f.data.sub_(f.grad.data * learning_rate)
上述这种简单的更新网络参数的方式叫:SGD,Stochastic Gradient Descent;还有更有效的参数更新方法:SGD,Nesterov-SGD,Adam,RMSProp;各种优化算法比较《An overview of gradient descent optimization algorithms》
结论是:Adagrad, Adadelta, RMSprop, 和 Adam 效果最好,适用面最广
Pytorch的torch.optim模块中已经实现了Adadelta、Adagrad、Adam、SparseAdam、Adamax、ASGD、LBFGS、RMSprop、Rprop和SGD,直接选用就行。基于torch.optim模块的网络参数更新代码:
import torch.optim as optim
# 选择并创建优化器
optimizer = optim.SGD(net.parameters(), lr=0.01)
# 循环执行下列步骤
optimizer.zero_grad() # 将上一次的梯度值清零
output = net(input) # 执行forward()计算
loss = criterion(output, target) # 计算Loss值
loss.backward() # 执行反向传播法
optimizer.step() # 更新网络参数