基于Pytorch训练神经网络的典型步骤

训练神经网络的典型步骤

定义具有一个权重可学习的神经网络，及其损失函数
遍历输入数据集
前向计算得到输出(forward)
通过损失函数计算损失(Loss）
通过反向传播法，计算各个神经元的梯度
更新网络的权重： weight = weight - learning_rate * gradient

第一步，定义神经网络，并执行前向计算(forward)。用户只需要定义forward函数，backward()方法就可以自动计算梯度。

import torch
import torch.nn as nn
import torch.nn.functional as F 

class Net(nn.Module): # 自定义神经网络类必须从nn.Module继承
    
    def __init__(self):
        super().__init__()  #初始化父类
        # 1 input image channel, 6 output channels, 3x3 conv kernel
        self.conv1 = nn.Conv2d(1, 6, 3)
        self.conv2 = nn.Conv2d(6, 16, 3)
        # fully-connected layers
        self.fc1 = nn.Linear(16*6*6, 120) # 6x6 image dimension
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)
    
    def forward(self, x): #必须定义前向计算方法
        x = F.max_pool2d(F.relu(self.conv1(x)), 2)
        x = F.max_pool2d(F.relu(self.conv2(x)), 2)
        x = x.view(-1, self.num_flat_features(x))
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x
    
    def num_flat_features(self, x):
        size = x.size()[1:]
        num_features = 1
        for s in size:
            num_features *= s
        return num_features


net = Net()
print(net)

params = list(net.parameters())
print(len(params))
for param in params:
    print(param.size())

input = torch.randn(1, 32, 32) #单个样本 channels x height x width
print(input.size())
input.unsqueeze_(0) # 构造出 mini-batches: nSamples x 单个样本 channels x height x width
print(input.size())
out = net(input)
print(out)

结果：

Net(
(conv1): Conv2d(1, 6, kernel_size=(3, 3), stride=(1, 1))
(conv2): Conv2d(6, 16, kernel_size=(3, 3), stride=(1, 1))
(fc1): Linear(in_features=576, out_features=120, bias=True)
(fc2): Linear(in_features=120, out_features=84, bias=True)
(fc3): Linear(in_features=84, out_features=10, bias=True)
)
10
torch.Size([6, 1, 3, 3])
torch.Size([6]) # bias
torch.Size([16, 6, 3, 3])
torch.Size([16]) # bias
torch.Size([120, 576])
torch.Size([120]) # bias
torch.Size([84, 120])
torch.Size([84]) # bias
torch.Size([10, 84])
torch.Size([10]) # bias
tensor([[-0.0017, -0.1201, 0.0749, -0.0046, -0.1258, -0.0141, -0.1134, 0.0689,
-0.0086, 0.0619]], grad_fn=<AddmmBackward>)

torch.nn仅支持：mini-batches方式; 若是单一样本，也要用语句：

input.unsqueeze_(0) # input格式为：nChannels x Height x Width

构造出mini-batches的格式，即：nSamples x nChannels x Height x Width

第二步：定义损失函数,执行反向传播法。损失函数输入（out，target），并计算一个值，该值估计out与target的距离。nn软件包下有几种不同的损失函数。一个简单的损失函数是：nn.MSELoss计算输入和目标之间的均方误差

# 构造出target
target = torch.zeros(10)
target = target.view(1, -1) # The same shape of out
print(target.shape, out.shape)

criterion = nn.MSELoss() # 使用torch库计算Mean Square Error
loss = criterion(out, target)
loss1 = (out**2).mean() #手动计算均方误差
print(loss, loss1)

#反向查看各层的梯度函数
print(loss.grad_fn) # Loss gradient funciton
print(loss.grad_fn.next_functions[0][0]) # Linear gradient funciton
print(loss.grad_fn.next_functions[0][0].next_functions[0][0]) # Relu gradient funciton

#反向传播法，注意梯度是积累的，需要手动将梯度缓冲区设置为零

net.zero_grad()             # 手动将梯度缓冲区设置为零
print(net.conv2.bias.grad)  # 查看conv2.bias的梯度值

loss.backward() #执行反向传播

print(net.conv2.bias.grad) #查看conv2.bias的梯度值

输出结果：

torch.Size([1, 10]) torch.Size([1, 10])
tensor(0.0061, grad_fn=<MseLossBackward>) tensor(0.0061, grad_fn=<MeanBackward0>)
<MseLossBackward object at 0x0000023F445B7E88>
<AddmmBackward object at 0x0000023F48396A88>
<AccumulateGrad object at 0x0000023F48396A88>
None
tensor([-0.0018, 0.0005, 0.0013, -0.0012, -0.0002, -0.0005, 0.0014, -0.0001,
0.0021, -0.0005, 0.0002, -0.0003, -0.0006, 0.0018, 0.0012, 0.0005])

第三步，也是最后一步，更新网络的权重。获得各个神经元的梯度后，可以用最简单的方式，更新神经网络的参数：

weight = weight - learning_rate * gradient

# 手动实现SGD
learning_rate = 0.01
for f in net.parameters():
    f.data.sub_(f.grad.data * learning_rate)

上述这种简单的更新网络参数的方式叫：SGD，Stochastic Gradient Descent；还有更有效的参数更新方法：SGD，Nesterov-SGD，Adam，RMSProp；各种优化算法比较《An overview of gradient descent optimization algorithms》
结论是：Adagrad, Adadelta, RMSprop, 和 Adam 效果最好，适用面最广

梯度下降法优化算法比较

Pytorch的torch.optim模块中已经实现了Adadelta、Adagrad、Adam、SparseAdam、Adamax、ASGD、LBFGS、RMSprop、Rprop和SGD，直接选用就行。基于torch.optim模块的网络参数更新代码：

import torch.optim as optim

# 选择并创建优化器
optimizer = optim.SGD(net.parameters(), lr=0.01)

# 循环执行下列步骤
optimizer.zero_grad()   # 将上一次的梯度值清零
output = net(input)     # 执行forward()计算
loss = criterion(output, target) # 计算Loss值
loss.backward()     # 执行反向传播法
optimizer.step()    # 更新网络参数