二.Pytorch与视觉竞赛入门

任务列表

任务1：PyTorch张量计算与Numpy的转换

任务要点：Pytorch基础使用、张量计算

步骤1：配置本地Notebook环境，或使用天池DSW：https://dsw-dev.data.aliyun.com/#/
步骤2：学习Pytorch的基础语法，并成功执行以下代码
- 基础pytorch教程：https://zhuanlan.zhihu.com/p/25572330
- 官方教程：https://pytorch.org/tutorials/beginner/basics/intro.html

c = np.ones((3,3))
d = torch.from_numpy(c)  #numpy 转tensor

解答：

import torch
import numpy as np

c = np.ones((3,3))
d = torch.from_numpy(c)  #numpy转tensor
e = d.numpy()  #tensor转numpy
d,e

任务2：梯度计算和梯度下降过程

任务要点：Pytorch梯度计算、随机梯度下降

步骤1：学习自动求梯度原理，https://pytorch.org/tutorials/beginner/basics/autogradqs_tutorial.html
步骤2：学习随机梯度下降原理，https://www.cnblogs.com/BYRans/p/4700202.html
步骤3：
- 使用numpy创建一个y=10*x+4+noise(0,1)的数据，其中x是0到100的范围，以0.01进行等差数列
- 使用pytorch定义w和b，并使用随机梯度下降，完成回归拟合。

解答：
一定要看清要求！！
由题意我觉得用nn.Linear()不合适，因为没有用到要去定义的w和b。我们nn.Parameter()初始化参数，介绍参考https://blog.csdn.net/qq_28753373/article/details/104179354
还有个坑就是这里用SGD一定要调小一点，不然会梯度爆炸。用Adam则不会出现问题。

import numpy as np
import torch.nn as nn
import torch
import torch.optim as optim

class MLP(nn.Module):

    def __init__(self, w, b):
        super(MLP, self).__init__()
        self.weight = nn.Parameter(w)  # 使用nn.Parameter()对weights进行初始化
        self.bias = nn.Parameter(b)

    def forward(self, x):
        out = torch.matmul(x, self.weight) + self.bias
        return out

w = torch.tensor([[1.]])
b = torch.tensor([1.])
x = np.arange(0, 100, 0.01)
noise = np.random.normal(0, 1, 10000)  # (0,1)的高斯噪声
y = 10 * x + 4 + noise

x = torch.from_numpy(x).float()
y = torch.from_numpy(y).float()  # .float
x = torch.unsqueeze(x, dim=1)  # 转换[1,10000]为[10000,1]
y = torch.unsqueeze(y, dim=1)  # 转换[1,10000]为[10000,1]

model = MLP(w, b)
criterion = nn.MSELoss()
optimizer = optim.SGD(model.parameters(), lr=0.0001)  # 一定是小的学习率，不然会梯度爆炸

for epoch in range(100000):
    y_pre = model(x)
    loss = criterion(y_pre, y)
    if epoch % 1000 == 0:
        print("Epoch:{}, loss is {}".format(epoch, loss))

    optimizer.zero_grad()  # 梯度清零
    loss.backward()  # 反向传播计算梯度
    optimizer.step()  # 更新参数

print("w的值为", model.weight.item())
print("b的值为", model.bias.item())

任务3：PyTorch全连接层原理和使用

任务要点：全连接网络
步骤1：学习全连接网络原理，https://blog.csdn.net/xiaodong_11/article/details/82015456

步骤2：在pytorch中使用矩阵乘法实现全连接层

步骤3：在pytorch中使用nn.Linear层

解答：

import torch
import torch.nn as nn

class Mylinear(nn.Module):
    def __init__(self, in_features, out_features):
        super(Mylinear, self).__init__()
        self.weight = nn.Parameter(torch.Tensor(out_features, in_features))
        self.bias = nn.Parameter(torch.Tensor(out_features))

    def forward(self, x):
        out = x@self.weight.t()+self.bias
        return out

model = Mylinear(784, 10)
x = torch.rand(100, 784)
out = model(x)

for name, param in model.named_parameters():
    print('%14s : %s' % (name, param.shape))  # 打印参数名和参数数量
    # print('%s' % param)  # 可以打印出参数

import torch
import torch.nn as nn


class linear(nn.Module):
    def __init__(self, in_features, out_features):
        super(linear, self).__init__()
        self.fc = nn.Linear(in_features, out_features)

    def forward(self, x):
        out = self.fc(x)
        return out


model = linear(784, 10)
x = torch.rand(100, 784)
out = model(x)

for name, param in model.named_parameters():
    print('%14s : %s' % (name, param.shape))  # 打印参数名和参数数量
    # print('%s' % param)  # 可以打印出参数

任务4：PyTorch激活函数原理和使用

任务要点：激活函数

步骤1：学习激活函数的原理，https://zhuanlan.zhihu.com/p/88429934

步骤2：在pytorch中手动实现上述激活函数

解答：

def ELU(x, alpha=1.0, inplace=False):
    return max(0, x) + min(0, alpha * (np.exp(x) - 1))


def LeakyReLU(x, negative_slope=0.01, inplace=False):
    return max(0, x) + negative_slope * min(0, x)


def PReLU(x, num_parameters=1, init=0.25):
    return max(0, x) + init * min(0, x)


def ReLU(x, inplace=False):
    return max(0, x)


def ReLU6(x, inplace=False):
    return min(max(0, x), 6)


def SELU(x,  inplace=False):
    alpha=1.6732632423543772848170429916717
    scale=1.0507009873554804934193349852946
    return scale * (max(0, x) + min(0, alpha * (np.exp(x) - 1)))


def CELU(x, alpha=1.00, inplace=False):
    return max(0, x) + min(0, alpha * (np.exp(x / alpha) - 1))


def Sigmoid(x):
    return 1.0 / (1.0 + np.exp(-x))


def LogSigmoid(x):
    return np.log(1.0 / (1.0 + np.exp(-x)))


def Tanh(x):
    return np.exp(x) - np.exp(-x) / np.exp(x) + np.exp(-x)


def TanhShrink(x):
    return x - (np.exp(x) - np.exp(-x) / np.exp(x) + np.exp(-x))


def Softplus(x, beta=1, threshold=20):
    return (1.0 / beta) * (np.log(1 + np.exp(beta * x)))


def SoftShrink(x, lambd=0.5):
    if x > lambd:
        return x - lambd
    elif x < -lambd:
        return x + lambd
    return 0

任务5：PyTorch卷积层原理和使用

任务要点：卷积层

步骤1：理解卷积层的原理和具体使用

https://blog.csdn.net/qq_37385726/article/details/81739179
https://www.cnblogs.com/zhangxiann/p/13584415.html

步骤2：计算下如下卷积层的参数量

nn.Conv2d(            
        in_channels=1,            
        out_channels=32,            
        kernel_size=5,            
        stride=1,            
        padding=2
)

解答：
方法一：函数计算

import torch.nn as nn
import torch  

class net(nn.Module):
    def __init__(self):
        super(net, self).__init__()
        self.conv1 = nn.Conv2d(
            in_channels=1,
            out_channels=32,
            kernel_size=5,
            stride=1,
            padding=2
        )

    def forward(self, x):
        return self.conv1(x)

model = net()
p = sum(map(lambda p: p.numel(), model.parameters()))
print(p)

方法二：直接计算

计算公式

in_channel是1，1个kernel的参数是5x5x1，32个out_channel也就是32个卷积核，bias也就是32，5x5x1x32+32=832

任务6：PyTorch常见的损失函数和优化器使用

任务要点：损失函数、优化器
步骤1：学习损失函数的细节，https://www.cnblogs.com/wanghui-garcia/p/10862733.html
步骤2：学习优化器的使用，https://pytorch.org/docs/stable/optim.html
步骤3：设置不同的优化器和学习率，重复任务2的回归过程

损失函数MSE、优化器SGD、学习率0.1
损失函数MSE、优化器SGD、学习率0.5
损失函数MSE、优化器SGD、学习率0.01

解答：
这里我们重新构造x，y, w，b, noise，方便用SGD优化。如果用任务2的数据，结果惨不忍睹（因为lr太大，第二个epoch都是nan了）

import torch
import torch.nn as nn
import torch.optim as optim

class linear(nn.Module):
    def __init__(self, in_features, out_features):
        super(linear, self).__init__()
        self.fc = nn.Linear(in_features, out_features)

    def forward(self, x):
        out = self.fc(x)
        return out


x = torch.randn(4, 3)
w = torch.randint(5, 10, size=(3, 1), dtype=torch.float)
b = torch.tensor(5.)
noise = torch.randn(4, 1)
y = x @ w + b + noise
res = {}

for lr in [0.5, 0.1, 0.01]:
    best_loss = float("inf")
    best_epoch = 0
    model = linear(3, 1)
    criterion = nn.MSELoss()
    optimizer = optim.SGD(model.parameters(), lr=lr)  
    for epoch in range(100000):
        y_pre = model(x)
        loss = criterion(y_pre, y)
        if epoch % 1000 == 0:
            print("Epoch:{}, loss is {}".format(epoch, loss))

        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        if loss < best_loss:
            best_loss = loss
            best_epoch = epoch
    res[lr] = {'loss': best_loss.item(), 'epoch': best_epoch}

print(res)

任务7：PyTorch池化层和归一化层

任务要点：池化层、归一化层

步骤1：使用pytorch代码实现2d pool中的mean-pooling、max-pooling
https://pytorch.org/docs/stable/nn.html#pooling-layers
https://blog.csdn.net/shanglianlm/article/details/85313924
步骤2：学习归一化的原理，https://blog.csdn.net/qq_23981335/article/details/106572171

解答：

import torch
import torch.nn as nn

x = torch.randn(10, 3, 32, 32)
avg = nn.AvgPool2d(3, 3)
print(avg(x).shape)

maxp = nn.MaxPool2d(7, 3)
print(maxp(x).shape)

输出为：
torch.Size([10, 3, 10, 10])
torch.Size([10, 3, 9, 9])

x = torch.randint(10, size=(10, 3, 32, 32)).float()

bn = nn.BatchNorm2d(3)
gn = nn.GroupNorm(num_groups=1, num_channels=3)   # num_channels必须被num_groups整除

任务8：使用PyTorch搭建VGG网络

任务要点：网络搭建

https://zhuanlan.zhihu.com/p/263527295

步骤1：理解VGG网络的原理。
步骤2：使用pytorch搭建VGG网络模型。
步骤3：打印出VGG 11层模型每层特征图的尺寸，以及参数量。

解答：

# -*- coding: UTF-8 -*-
import torch
import torch.nn as nn
import torch.nn.functional as F
from torchsummary import summary


class VGG(nn.Module):
    """
    VGG builder
    """

    def __init__(self, arch: object, num_classes=1000) -> object:
        super(VGG, self).__init__()
        self.in_channels = 3
        self.conv3_64 = self.__make_layer(64, arch[0])
        self.conv3_128 = self.__make_layer(128, arch[1])
        self.conv3_256 = self.__make_layer(256, arch[2])
        self.conv3_512a = self.__make_layer(512, arch[3])
        self.conv3_512b = self.__make_layer(512, arch[4])
        self.fc1 = nn.Linear(7 * 7 * 512, 4096)
        self.bn1 = nn.BatchNorm1d(4096)
        self.bn2 = nn.BatchNorm1d(4096)
        self.fc2 = nn.Linear(4096, 4096)
        self.fc3 = nn.Linear(4096, num_classes)

    def __make_layer(self, channels, num):
        layers = []
        for i in range(num):
            layers.append(nn.Conv2d(self.in_channels, channels, 3, stride=1, padding=1, bias=False))  # same padding
            layers.append(nn.BatchNorm2d(channels))
            layers.append(nn.ReLU())
            self.in_channels = channels
        return nn.Sequential(*layers)

    def forward(self, x):
        out = self.conv3_64(x)
        out = F.max_pool2d(out, 2)
        out = self.conv3_128(out)
        out = F.max_pool2d(out, 2)
        out = self.conv3_256(out)
        out = F.max_pool2d(out, 2)
        out = self.conv3_512a(out)
        out = F.max_pool2d(out, 2)
        out = self.conv3_512b(out)
        out = F.max_pool2d(out, 2)
        out = out.view(out.size(0), -1)
        out = self.fc1(out)
        out = self.bn1(out)
        out = F.relu(out)
        out = self.fc2(out)
        out = self.bn2(out)
        out = F.relu(out)
        return self.fc3(out)


def VGG_11():
    return VGG([1, 1, 2, 2, 2], num_classes=1000)


def VGG_13():
    return VGG([1, 1, 2, 2, 2], num_classes=1000)


def VGG_16():
    return VGG([2, 2, 3, 3, 3], num_classes=1000)


def VGG_19():
    return VGG([2, 2, 4, 4, 4], num_classes=1000)


def test():
    net = VGG_11()
    # net = VGG_13()
    # net = VGG_16()
    # net = VGG_19()
    summary(net, (3, 224, 224), device="cpu")


test()

输出

----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
================================================================
            Conv2d-1         [-1, 64, 224, 224]           1,728
       BatchNorm2d-2         [-1, 64, 224, 224]             128
              ReLU-3         [-1, 64, 224, 224]               0
            Conv2d-4        [-1, 128, 112, 112]          73,728
       BatchNorm2d-5        [-1, 128, 112, 112]             256
              ReLU-6        [-1, 128, 112, 112]               0
            Conv2d-7          [-1, 256, 56, 56]         294,912
       BatchNorm2d-8          [-1, 256, 56, 56]             512
              ReLU-9          [-1, 256, 56, 56]               0
           Conv2d-10          [-1, 256, 56, 56]         589,824
      BatchNorm2d-11          [-1, 256, 56, 56]             512
             ReLU-12          [-1, 256, 56, 56]               0
           Conv2d-13          [-1, 512, 28, 28]       1,179,648
      BatchNorm2d-14          [-1, 512, 28, 28]           1,024
             ReLU-15          [-1, 512, 28, 28]               0
           Conv2d-16          [-1, 512, 28, 28]       2,359,296
      BatchNorm2d-17          [-1, 512, 28, 28]           1,024
             ReLU-18          [-1, 512, 28, 28]               0
           Conv2d-19          [-1, 512, 14, 14]       2,359,296
      BatchNorm2d-20          [-1, 512, 14, 14]           1,024
             ReLU-21          [-1, 512, 14, 14]               0
           Conv2d-22          [-1, 512, 14, 14]       2,359,296
      BatchNorm2d-23          [-1, 512, 14, 14]           1,024
             ReLU-24          [-1, 512, 14, 14]               0
           Linear-25                 [-1, 4096]     102,764,544
      BatchNorm1d-26                 [-1, 4096]           8,192
           Linear-27                 [-1, 4096]      16,781,312
      BatchNorm1d-28                 [-1, 4096]           8,192
           Linear-29                 [-1, 1000]       4,097,000
================================================================
Total params: 132,882,472
Trainable params: 132,882,472
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.57
Forward/backward pass size (MB): 170.10
Params size (MB): 506.91
Estimated Total Size (MB): 677.58
----------------------------------------------------------------

任务9：使用PyTorch搭建ResNet网络

任务要点：网络搭建

https://zhuanlan.zhihu.com/p/263526658

步骤1：理解ResNet网络的原理。
步骤2：使用pytorch搭建ResNet网络模型。
步骤3：打印出ResNet 18模型每层特征图的尺寸，以及参数量。

解答：

# -*- coding: UTF-8 -*-
import torch
import torch.nn as nn
import torch.nn.functional as F
from torchsummary import summary


class SE(nn.Module):

    def __init__(self, in_chnls, ratio):
        super(SE, self).__init__()
        self.squeeze = nn.AdaptiveAvgPool2d((1, 1))
        self.compress = nn.Conv2d(in_chnls, in_chnls // ratio, 1, 1, 0)
        self.excitation = nn.Conv2d(in_chnls // ratio, in_chnls, 1, 1, 0)

    def forward(self, x):
        out = self.squeeze(x)
        out = self.compress(out)
        out = F.relu(out)
        out = self.excitation(out)
        return F.sigmoid(out)


class BN_Conv2d(nn.Module):
    """
    BN_CONV, default activation is ReLU
    """

    def __init__(self, in_channels: object, out_channels: object, kernel_size: object, stride: object, padding: object,
                 dilation=1, groups=1, bias=False, activation=True) -> object:
        super(BN_Conv2d, self).__init__()
        layers = [nn.Conv2d(in_channels, out_channels, kernel_size=kernel_size, stride=stride,
                            padding=padding, dilation=dilation, groups=groups, bias=bias),
                  nn.BatchNorm2d(out_channels)]
        if activation:
            layers.append(nn.ReLU(inplace=True))
        self.seq = nn.Sequential(*layers)

    def forward(self, x):
        return self.seq(x)


class BasicBlock(nn.Module):
    """
    basic building block for ResNet-18, ResNet-34
    """
    message = "basic"

    def __init__(self, in_channels, out_channels, strides, is_se=False):
        super(BasicBlock, self).__init__()
        self.is_se = is_se
        self.conv1 = BN_Conv2d(in_channels, out_channels, 3, stride=strides, padding=1, bias=False)  # same padding
        self.conv2 = BN_Conv2d(out_channels, out_channels, 3, stride=1, padding=1, bias=False, activation=False)
        if self.is_se:
            self.se = SE(out_channels, 16)

        # fit input with residual output
        self.short_cut = nn.Sequential()
        if strides is not 1:
            self.short_cut = nn.Sequential(
                nn.Conv2d(in_channels, out_channels, 1, stride=strides, padding=0, bias=False),
                nn.BatchNorm2d(out_channels)
            )

    def forward(self, x):
        out = self.conv1(x)
        out = self.conv2(out)
        if self.is_se:
            coefficient = self.se(out)
            out = out * coefficient
        out = out + self.short_cut(x)
        return F.relu(out)


class ResNet(nn.Module):
    """
    building ResNet_34
    """

    def __init__(self, block: object, groups: object, num_classes=1000) -> object:
        super(ResNet, self).__init__()
        self.channels = 64  # out channels from the first convolutional layer
        self.block = block

        self.conv1 = nn.Conv2d(3, self.channels, 7, stride=2, padding=3, bias=False)
        self.bn = nn.BatchNorm2d(self.channels)
        self.pool1 = nn.MaxPool2d(3, 2, 1)
        self.conv2_x = self._make_conv_x(channels=64, blocks=groups[0], strides=1, index=2)
        self.conv3_x = self._make_conv_x(channels=128, blocks=groups[1], strides=2, index=3)
        self.conv4_x = self._make_conv_x(channels=256, blocks=groups[2], strides=2, index=4)
        self.conv5_x = self._make_conv_x(channels=512, blocks=groups[3], strides=2, index=5)
        self.pool2 = nn.AvgPool2d(7)
        patches = 512 if self.block.message == "basic" else 512 * 4
        self.fc = nn.Linear(patches, num_classes)  # for 224 * 224 input size

    def _make_conv_x(self, channels, blocks, strides, index):
        """
        making convolutional group
        :param channels: output channels of the conv-group
        :param blocks: number of blocks in the conv-group
        :param strides: strides
        :return: conv-group
        """
        list_strides = [strides] + [1] * (blocks - 1)  # In conv_x groups, the first strides is 2, the others are ones.
        conv_x = nn.Sequential()
        for i in range(len(list_strides)):
            layer_name = str("block_%d_%d" % (index, i))  # when use add_module, the name should be difference.
            conv_x.add_module(layer_name, self.block(self.channels, channels, list_strides[i]))
            self.channels = channels if self.block.message == "basic" else channels * 4
        return conv_x

    def forward(self, x):
        out = self.conv1(x)
        out = F.relu(self.bn(out))
        out = self.pool1(out)
        out = self.conv2_x(out)
        out = self.conv3_x(out)
        out = self.conv4_x(out)
        out = self.conv5_x(out)
        out = self.pool2(out)
        out = out.view(out.size(0), -1)
        out = F.softmax(self.fc(out))
        return out


def ResNet_18(num_classes=1000):
    return ResNet(block=BasicBlock, groups=[2, 2, 2, 2], num_classes=num_classes)


# def ResNet_34(num_classes=1000):
#     return ResNet(block=BasicBlock, groups=[3, 4, 6, 3], num_classes=num_classes)
#
# def ResNet_50(num_classes=1000):
#     return ResNet(block=BottleNeck, groups=[3, 4, 6, 3], num_classes=num_classes)
#
# def ResNet_101(num_classes=1000):
#     return ResNet(block=BottleNeck, groups=[3, 4, 23, 3], num_classes=num_classes)
#
# def ResNet_152(num_classes=1000):
#     return ResNet(block=BottleNeck, groups=[3, 8, 36, 3], num_classes=num_classes)

def test():
    net = ResNet_18()
    # net = ResNet_34()
    # net = ResNet_50()
    # net = ResNet_101()
    # net = ResNet_152()
    summary(net, (3, 224, 224), device='cpu')


test()

输出

----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
================================================================
            Conv2d-1         [-1, 64, 112, 112]           9,408
       BatchNorm2d-2         [-1, 64, 112, 112]             128
         MaxPool2d-3           [-1, 64, 56, 56]               0
            Conv2d-4           [-1, 64, 56, 56]          36,864
       BatchNorm2d-5           [-1, 64, 56, 56]             128
              ReLU-6           [-1, 64, 56, 56]               0
         BN_Conv2d-7           [-1, 64, 56, 56]               0
            Conv2d-8           [-1, 64, 56, 56]          36,864
       BatchNorm2d-9           [-1, 64, 56, 56]             128
        BN_Conv2d-10           [-1, 64, 56, 56]               0
       BasicBlock-11           [-1, 64, 56, 56]               0
           Conv2d-12           [-1, 64, 56, 56]          36,864
      BatchNorm2d-13           [-1, 64, 56, 56]             128
             ReLU-14           [-1, 64, 56, 56]               0
        BN_Conv2d-15           [-1, 64, 56, 56]               0
           Conv2d-16           [-1, 64, 56, 56]          36,864
      BatchNorm2d-17           [-1, 64, 56, 56]             128
        BN_Conv2d-18           [-1, 64, 56, 56]               0
       BasicBlock-19           [-1, 64, 56, 56]               0
           Conv2d-20          [-1, 128, 28, 28]          73,728
      BatchNorm2d-21          [-1, 128, 28, 28]             256
             ReLU-22          [-1, 128, 28, 28]               0
        BN_Conv2d-23          [-1, 128, 28, 28]               0
           Conv2d-24          [-1, 128, 28, 28]         147,456
      BatchNorm2d-25          [-1, 128, 28, 28]             256
        BN_Conv2d-26          [-1, 128, 28, 28]               0
           Conv2d-27          [-1, 128, 28, 28]           8,192
      BatchNorm2d-28          [-1, 128, 28, 28]             256
       BasicBlock-29          [-1, 128, 28, 28]               0
           Conv2d-30          [-1, 128, 28, 28]         147,456
      BatchNorm2d-31          [-1, 128, 28, 28]             256
             ReLU-32          [-1, 128, 28, 28]               0
        BN_Conv2d-33          [-1, 128, 28, 28]               0
           Conv2d-34          [-1, 128, 28, 28]         147,456
      BatchNorm2d-35          [-1, 128, 28, 28]             256
        BN_Conv2d-36          [-1, 128, 28, 28]               0
       BasicBlock-37          [-1, 128, 28, 28]               0
           Conv2d-38          [-1, 256, 14, 14]         294,912
      BatchNorm2d-39          [-1, 256, 14, 14]             512
             ReLU-40          [-1, 256, 14, 14]               0
        BN_Conv2d-41          [-1, 256, 14, 14]               0
           Conv2d-42          [-1, 256, 14, 14]         589,824
      BatchNorm2d-43          [-1, 256, 14, 14]             512
        BN_Conv2d-44          [-1, 256, 14, 14]               0
           Conv2d-45          [-1, 256, 14, 14]          32,768
      BatchNorm2d-46          [-1, 256, 14, 14]             512
       BasicBlock-47          [-1, 256, 14, 14]               0
           Conv2d-48          [-1, 256, 14, 14]         589,824
      BatchNorm2d-49          [-1, 256, 14, 14]             512
             ReLU-50          [-1, 256, 14, 14]               0
        BN_Conv2d-51          [-1, 256, 14, 14]               0
           Conv2d-52          [-1, 256, 14, 14]         589,824
      BatchNorm2d-53          [-1, 256, 14, 14]             512
        BN_Conv2d-54          [-1, 256, 14, 14]               0
       BasicBlock-55          [-1, 256, 14, 14]               0
           Conv2d-56            [-1, 512, 7, 7]       1,179,648
      BatchNorm2d-57            [-1, 512, 7, 7]           1,024
             ReLU-58            [-1, 512, 7, 7]               0
        BN_Conv2d-59            [-1, 512, 7, 7]               0
           Conv2d-60            [-1, 512, 7, 7]       2,359,296
      BatchNorm2d-61            [-1, 512, 7, 7]           1,024
        BN_Conv2d-62            [-1, 512, 7, 7]               0
           Conv2d-63            [-1, 512, 7, 7]         131,072
      BatchNorm2d-64            [-1, 512, 7, 7]           1,024
       BasicBlock-65            [-1, 512, 7, 7]               0
           Conv2d-66            [-1, 512, 7, 7]       2,359,296
      BatchNorm2d-67            [-1, 512, 7, 7]           1,024
             ReLU-68            [-1, 512, 7, 7]               0
        BN_Conv2d-69            [-1, 512, 7, 7]               0
           Conv2d-70            [-1, 512, 7, 7]       2,359,296
      BatchNorm2d-71            [-1, 512, 7, 7]           1,024
        BN_Conv2d-72            [-1, 512, 7, 7]               0
       BasicBlock-73            [-1, 512, 7, 7]               0
        AvgPool2d-74            [-1, 512, 1, 1]               0
           Linear-75                 [-1, 1000]         513,000
================================================================
Total params: 11,689,512
Trainable params: 11,689,512
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.57
Forward/backward pass size (MB): 62.41
Params size (MB): 44.59
Estimated Total Size (MB): 107.58
----------------------------------------------------------------

任务10：使用PyTorch完成Fashion-MNIST分类

https://github.com/masoudrostami/Fashion-MNIST-using-PyTorch/blob/main/MNIST%20Fashion%20Project.ipynb

步骤1：搭建4层卷积 + 2层全连接的分类模型。
步骤2：在训练过程中记录下每个epoch的训练集精度和测试集精度。

解答：

from torch import nn, optim
from torchvision import transforms
import torchvision
import torch
from torch.utils.data import Dataset


# Different  classes in Fashion MNIST dataset
# classes = ('Tshirt', 'Trouser', 'Pullover', 'Dress', 'Coat',
#            'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Anke boot')

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
transform = transforms.Compose([transforms.ToTensor(),
                                transforms.Normalize((0.5,), (0.5,))])

# Download and load the training and test data
trainset = torchvision.datasets.FashionMNIST('data', download=True, train=True, transform=transform)
testset = torchvision.datasets.FashionMNIST('data', download=True, train=False, transform=transform)

trainloader = torch.utils.data.DataLoader(trainset, batch_size=64, shuffle=True)
testloader = torch.utils.data.DataLoader(testset, batch_size=64, shuffle=True)


class Fashion(nn.Module):
    def __init__(self):
        super(Fashion, self).__init__()
        self.model = nn.Sequential(nn.Conv2d(1, 6, kernel_size=7, stride=1),
                                   nn.Conv2d(6, 16, kernel_size=7, stride=1),
                                   nn.Conv2d(16, 16, kernel_size=7, stride=1),
                                   nn.Conv2d(16, 32, kernel_size=7, stride=1),
                                   nn.Flatten(),
                                   nn.Linear(32 * 4 * 4, 128),
                                   nn.Linear(128, 10),
                                   )

    def forward(self, x):
        out = self.model(x)
        return out


def get_acc(output, label):
    total = output.shape[0]
    _, pred_label = output.max(1)
    num_correct = (pred_label == label).sum().item()
    return num_correct / total


model = Fashion().to(device)
error = nn.NLLLoss().to(device)  
learning_rate = 0.1  
optimizer = optim.Adam(model.parameters(), lr=learning_rate)

num_epochs = 50

for e in range(num_epochs):
    # running_loss = 0
    train_acc = 0
    for images, labels in trainloader:
        images, labels = images.to(device), labels.to(device)
        log_ps = model(images)
        loss = error(log_ps, labels)

        optimizer.zero_grad()  # Zeroing our gradients
        loss.backward()  # Taking  backward pass
        optimizer.step()

        train_acc += get_acc(log_ps, labels)
        # running_loss += loss.item()

    print(f"Training Accuracy: {train_acc / len(trainloader) * 100:.2f}", end=" ")

    model.eval()
    with torch.no_grad():
        total = 0
        correct = 0
        for images, labels in testloader:
            images, labels = images.to(device), labels.to(device)
            log_ps = model(images)
            mx_index = torch.argmax(log_ps, dim=1)
            total += labels.numel()
            correct += sum(mx_index == labels).item()
        print(f"Test Accuracy {correct / total * 100:.2f}")

结果有点莫名其妙，待优化...

任务11：使用PyTorch完成人脸关键点检测

数据集：https://ai-contest-static.xfyun.cn/2021/7afa865e-5ac8-48ab-9966-d88bb33cdc15/%E4%BA%BA%E8%84%B8%E5%85%B3%E9%94%AE%E7%82%B9%E6%A3%80%E6%B5%8B%E6%8C%91%E6%88%98%E8%B5%9B_%E6%95%B0%E6%8D%AE%E9%9B%86.zip

https://gitee.com/coggle/competition-baseline/blob/master/competition/%E7%A7%91%E5%A4%A7%E8%AE%AF%E9%A3%9EAI%E5%BC%80%E5%8F%91%E8%80%85%E5%A4%A7%E8%B5%9B2021/%E4%BA%BA%E8%84%B8%E5%85%B3%E9%94%AE%E7%82%B9%E6%A3%80%E6%B5%8B%E6%8C%91%E6%88%98%E8%B5%9B/face-keypoint2.ipynb

步骤1：搭建4层卷积 + 2层的模型完成关键点回归。
步骤2：使用resnet18预训练模型完成关键点回归。

解答：

import os, sys, codecs, glob
from PIL import Image, ImageDraw
import numpy as np
import pandas as pd
import time
from torchvision.models.resnet import resnet18
import torch
import matplotlib.pyplot as plt
from sklearn.metrics import mean_absolute_error
import timm

torch.backends.cudnn.benchmark = False
import torchvision.models as models
import torchvision.transforms as transforms
import torchvision.datasets as datasets
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torch.autograd import Variable
from torch.utils.data.dataset import Dataset
import warnings

warnings.filterwarnings("ignore")


# 单个样本读取
class load_data(Dataset):
    def __init__(self, img, keypoint, transform=None):
        self.img = img
        self.transform = transform
        self.keypoint = keypoint

    def __getitem__(self, index):
        img = Image.fromarray(self.img[:, :, index]).convert('RGB')

        if self.transform is not None:
            img = self.transform(img)

        return img, self.keypoint[index] / 96.0

    def __len__(self):
        return self.img.shape[-1]


class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.model = nn.Sequential(nn.Conv2d(3, 6, kernel_size=5, stride=2),
                                   nn.Conv2d(6, 16, kernel_size=5, stride=2),
                                   nn.Conv2d(16, 16, kernel_size=5, stride=2),
                                   nn.Conv2d(16, 32, kernel_size=5, stride=2),
                                   nn.Flatten(),
                                   nn.Linear(32 * 3 * 3, 784),
                                   nn.Linear(784, 8),
                                   )

    def forward(self, x):
        out = self.model(x)
        return out


class resNet(nn.Module):
    def __init__(self, model):
        super(resNet, self).__init__()
        self.model = nn.Sequential(*list(model.children())[:-1],  # [b, 512, 1, 1] *是防止Sequential打乱顺序
                                   nn.Flatten(),  # [b, 512, 1, 1] -> [b, 512*1*1]
                                   nn.Linear(512, 8))

    def forward(self, x):
        out = self.model(x)
        return out


class XunFeiNet(nn.Module):
    def __init__(self):
        super(XunFeiNet, self).__init__()
        self.model = timm.create_model('resnet18', num_classes=8, pretrained=True)

    def forward(self, img, labels=None):
        feat = self.model(img)
        return feat


def train(train_loader, model, criterion, optimizer, epoch):
    model.train()

    for i, (input, target) in enumerate(train_loader):
        input = input.cuda(non_blocking=True).float()
        target = target.cuda(non_blocking=True).float()

        output = model(input)
        loss = criterion(output, target)
        optimizer.zero_grad()
        torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0)
        loss.backward()
        optimizer.step()

        if i % 200 == 0:
            print(loss.item())


def validate(val_loader, model):
    model.eval()

    val_feats = []
    with torch.no_grad():
        end = time.time()
        for i, (input, target) in enumerate(val_loader):
            input = input.cuda().float()
            target = target.cuda().float()
            output = model(input)
            val_feats.append(output.data.cpu().numpy())
    return val_feats


if __name__ == '__main__':
    train_df = pd.read_csv('人脸关键点检测挑战赛_数据集/train.csv')
    train_df = train_df.fillna(48)
    train_img = np.load('人脸关键点检测挑战赛_数据集/train.npy')
    test_img = np.load('人脸关键点检测挑战赛_数据集/test.npy')

    # 单个样本读取 -> 批量样本读取
    train_loader = torch.utils.data.DataLoader(
        load_data(train_img[:, :, :-500], train_df.values[:-500],
                  transforms.Compose([
                      transforms.ToTensor(),
                  ])
                  ),
        batch_size=10, shuffle=True
    )

    val_loader = torch.utils.data.DataLoader(
        load_data(train_img[:, :, -500:], train_df.values[-500:],
                  transforms.Compose([
                      transforms.ToTensor(),
                  ])
                  ),
        batch_size=10, shuffle=False
    )

    test_loader = torch.utils.data.DataLoader(
        load_data(test_img, np.zeros((2049, 8)),
                  transforms.Compose([
                      transforms.ToTensor(),
                  ])
                  ),
        batch_size=10, shuffle=False
    )

    # model = Net().cuda()
    # model = resNet(resnet18(pretrained=True)).cuda()
    model = XunFeiNet().cuda()
    criterion = nn.MSELoss().cuda()
    optimizer = torch.optim.Adam(model.parameters(), 0.001)
    scheduler = optim.lr_scheduler.StepLR(optimizer, step_size=10, gamma=0.85)
    best_acc = 0.0

    for epoch in range(5):
        print('Epoch: ', epoch)

        train(train_loader, model, criterion, optimizer, epoch)

        val_feats = validate(val_loader, model)
        scheduler.step()

        val_feats = np.vstack(val_feats) * 96
        print('Val', mean_absolute_error(val_feats, train_df.values[-500:]))

    pred_tta = []
    pred = []
    with torch.no_grad():
        for t, (x, y) in enumerate(test_loader):
            x_var = x.cuda()
            y_var = y.cuda()
            scores = model(x_var)
            pred.append(scores.data.cpu().numpy())
    pred = np.concatenate(pred, 0)
    pred_tta.append(pred)

    pred = np.mean(pred_tta, axis=0)

    idx = 409
    xy = pred[idx].reshape(4, 2) * 96
    plt.scatter(xy[:, 0], xy[:, 1], c='r')
    plt.imshow(test_img[:, :, idx], cmap='gray')
    plt.show()

    # col = ['left_eye_center_x', 'left_eye_center_y', 'right_eye_center_x',
    #  'right_eye_center_y', 'nose_tip_x', 'nose_tip_y',
    #  'mouth_center_bottom_lip_x','mouth_center_bottom_lip_y']
    # pd.DataFrame(pred * 96, columns=col).to_csv('submit.csv', index=None)

Net，resNet，XunFeiNet分别指自己搭建的网络，改造的torchvision.models预训练resnet18网络和timm中的预训练resnet18，注意：运行code，使用预训练模型需要等待自动下载完成！
Net结果：

测试一张图片

resNet结果：

测试一张图片

XunFeiNet结果：

测试一张图片

任务12：使用PyTorch搭建对抗生成网络

步骤1：学习对抗生成网络的原理，https://blog.csdn.net/DFCED/article/details/105175097
步骤2：学习DCGAN的代码实现，https://pytorch.org/tutorials/beginner/dcgan_faces_tutorial.html
步骤3：使用任务11中的人脸数据（缩放至64*64），并使用DCGAN完成训练，生成人脸。

解答：

from __future__ import print_function
# %matplotlib inline
import random
import torch.backends.cudnn as cudnn
import torch.utils.data
import torchvision.utils as vutils
import matplotlib.pyplot as plt
import matplotlib.animation as animation
from IPython.display import HTML

from PIL import Image, ImageDraw
import numpy as np
import pandas as pd
torch.backends.cudnn.benchmark = False
import torchvision.transforms as transforms
import torch.nn as nn
import torch.optim as optim
from torch.utils.data.dataset import Dataset


class XunFeiDataset(Dataset):
    def __init__(self, img, keypoint, transform=None):
        self.img = img
        self.transform = transform
        self.keypoint = keypoint

    def __getitem__(self, index):
        img = Image.fromarray(self.img[:, :, index]).convert('RGB')

        if self.transform is not None:
            img = self.transform(img)

        return img, self.keypoint[index] / 96.0

    def __len__(self):
        return self.img.shape[-1]


# custom weights initialization called on netG and netD
def weights_init(m):
    classname = m.__class__.__name__
    if classname.find('Conv') != -1:
        nn.init.normal_(m.weight.data, 0.0, 0.02)
    elif classname.find('BatchNorm') != -1:
        nn.init.normal_(m.weight.data, 1.0, 0.02)
        nn.init.constant_(m.bias.data, 0)


# Generator Code
class Generator(nn.Module):
    def __init__(self, ngpu):
        super(Generator, self).__init__()
        self.ngpu = ngpu
        self.main = nn.Sequential(
            # input is Z, going into a convolution
            nn.ConvTranspose2d(nz, ngf * 8, 4, 1, 0, bias=False),
            nn.BatchNorm2d(ngf * 8),
            nn.ReLU(True),
            # state size. (ngf*8) x 4 x 4
            nn.ConvTranspose2d(ngf * 8, ngf * 4, 4, 2, 1, bias=False),
            nn.BatchNorm2d(ngf * 4),
            nn.ReLU(True),
            # state size. (ngf*4) x 8 x 8
            nn.ConvTranspose2d(ngf * 4, ngf * 2, 4, 2, 1, bias=False),
            nn.BatchNorm2d(ngf * 2),
            nn.ReLU(True),
            # state size. (ngf*2) x 16 x 16
            nn.ConvTranspose2d(ngf * 2, ngf, 4, 2, 1, bias=False),
            nn.BatchNorm2d(ngf),
            nn.ReLU(True),
            # state size. (ngf) x 32 x 32
            nn.ConvTranspose2d(ngf, nc, 4, 2, 1, bias=False),
            nn.Tanh()
            # state size. (nc) x 64 x 64
        )

    def forward(self, input):
        return self.main(input)


# Discriminator Code
class Discriminator(nn.Module):
    def __init__(self, ngpu):
        super(Discriminator, self).__init__()
        self.ngpu = ngpu
        self.main = nn.Sequential(
            # input is (nc) x 64 x 64
            nn.Conv2d(nc, ndf, 4, 2, 1, bias=False),
            nn.LeakyReLU(0.2, inplace=True),
            # state size. (ndf) x 32 x 32
            nn.Conv2d(ndf, ndf * 2, 4, 2, 1, bias=False),
            nn.BatchNorm2d(ndf * 2),
            nn.LeakyReLU(0.2, inplace=True),
            # state size. (ndf*2) x 16 x 16
            nn.Conv2d(ndf * 2, ndf * 4, 4, 2, 1, bias=False),
            nn.BatchNorm2d(ndf * 4),
            nn.LeakyReLU(0.2, inplace=True),
            # state size. (ndf*4) x 8 x 8
            nn.Conv2d(ndf * 4, ndf * 8, 4, 2, 1, bias=False),
            nn.BatchNorm2d(ndf * 8),
            nn.LeakyReLU(0.2, inplace=True),
            # state size. (ndf*8) x 4 x 4
            nn.Conv2d(ndf * 8, 1, 4, 1, 0, bias=False),
            nn.Sigmoid()
        )

    def forward(self, input):
        return self.main(input)


if __name__ == '__main__':
    # Set random seed for reproducibility
    manualSeed = 999
    # manualSeed = random.randint(1, 10000) # use if you want new results
    print("Random Seed: ", manualSeed)
    random.seed(manualSeed)
    torch.manual_seed(manualSeed)

    # Root directory for dataset
    dataroot = "celeba/"  # 我们加载的数据不需要
    # Number of workers for dataloader
    workers = 0
    # Batch size during training
    batch_size = 128
    # Spatial size of training images. All images will be resized to this
    #   size using a transformer.
    image_size = 64
    # Number of channels in the training images. For color images this is 3 通道数
    nc = 3
    # Size of z latent vector (i.e. size of generator input)
    nz = 100
    # Size of feature maps in generator
    ngf = 64
    # Size of feature maps in discriminator
    ndf = 64
    # Number of training epochs
    num_epochs = 5
    # Learning rate for optimizers
    lr = 0.0002
    # Beta1 hyperparam for Adam optimizers
    beta1 = 0.5
    # Number of GPUs available. Use 0 for CPU mode.
    ngpu = 1

    train_img = np.load('人脸关键点检测挑战赛_数据集/train.npy')
    test_img = np.load('人脸关键点检测挑战赛_数据集/test.npy')
    train_df = pd.read_csv('人脸关键点检测挑战赛_数据集/train.csv')
    train_df = train_df.fillna(48)

    dataloader = torch.utils.data.DataLoader(
        XunFeiDataset(train_img[:, :, :-500], train_df.values[:-500],
                      transform=transforms.Compose([
                          transforms.Resize([64, 64]),
                          transforms.ToTensor(),
                          transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)),
                      ])
                      ), batch_size=batch_size, shuffle=True, num_workers=workers)
    device = torch.device("cuda:0" if (torch.cuda.is_available() and ngpu > 0) else "cpu")

    # 训练图可视化
    real_batch = next(iter(dataloader))
    plt.figure(figsize=(8, 8))
    plt.axis("off")
    plt.title("Training Images")
    plt.imshow(
        np.transpose(vutils.make_grid(real_batch[0].to(device)[:64], padding=2, normalize=True).cpu(), (1, 2, 0)))
    plt.show()

    # Create the generator
    netG = Generator(ngpu).to(device)

    # Handle multi-gpu if desired
    if (device.type == 'cuda') and (ngpu > 1):
        netG = nn.DataParallel(netG, list(range(ngpu)))

    # Apply the weights_init function to randomly initialize all weights
    #  to mean=0, stdev=0.02.
    netG.apply(weights_init)

    # Print the model
    print(netG)

    # Create the Discriminator
    netD = Discriminator(ngpu).to(device)

    # Handle multi-gpu if desired
    if (device.type == 'cuda') and (ngpu > 1):
        netD = nn.DataParallel(netD, list(range(ngpu)))

    # Apply the weights_init function to randomly initialize all weights
    #  to mean=0, stdev=0.2.
    netD.apply(weights_init)

    # Print the model
    print(netD)

    # Initialize BCELoss function
    criterion = nn.BCELoss()

    # Create batch of latent vectors that we will use to visualize
    #  the progression of the generator
    fixed_noise = torch.randn(64, nz, 1, 1, device=device)

    # Establish convention for real and fake labels during training
    real_label = 1.
    fake_label = 0.

    # Setup Adam optimizers for both G and D
    optimizerD = optim.Adam(netD.parameters(), lr=lr, betas=(beta1, 0.999))
    optimizerG = optim.Adam(netG.parameters(), lr=lr, betas=(beta1, 0.999))

    # Training Loop

    # Lists to keep track of progress
    img_list = []
    G_losses = []
    D_losses = []
    iters = 0

    print("Starting Training Loop...")
    # For each epoch
    for epoch in range(num_epochs):
        # For each batch in the dataloader
        for i, data in enumerate(dataloader, 0):

            ############################
            # (1) Update D network: maximize log(D(x)) + log(1 - D(G(z)))
            ###########################
            ## Train with all-real batch
            netD.zero_grad()
            # Format batch
            real_cpu = data[0].to(device)
            b_size = real_cpu.size(0)
            label = torch.full((b_size,), real_label, dtype=torch.float, device=device)
            # Forward pass real batch through D
            output = netD(real_cpu).view(-1)
            # Calculate loss on all-real batch
            errD_real = criterion(output, label)
            # Calculate gradients for D in backward pass
            errD_real.backward()
            D_x = output.mean().item()

            ## Train with all-fake batch
            # Generate batch of latent vectors
            noise = torch.randn(b_size, nz, 1, 1, device=device)
            # Generate fake image batch with G
            fake = netG(noise)
            label.fill_(fake_label)
            # Classify all fake batch with D
            output = netD(fake.detach()).view(-1)
            # Calculate D's loss on the all-fake batch
            errD_fake = criterion(output, label)
            # Calculate the gradients for this batch, accumulated (summed) with previous gradients
            errD_fake.backward()
            D_G_z1 = output.mean().item()
            # Compute error of D as sum over the fake and the real batches
            errD = errD_real + errD_fake
            # Update D
            optimizerD.step()

            ############################
            # (2) Update G network: maximize log(D(G(z)))
            ###########################
            netG.zero_grad()
            label.fill_(real_label)  # fake labels are real for generator cost
            # Since we just updated D, perform another forward pass of all-fake batch through D
            output = netD(fake).view(-1)
            # Calculate G's loss based on this output
            errG = criterion(output, label)
            # Calculate gradients for G
            errG.backward()
            D_G_z2 = output.mean().item()
            # Update G
            optimizerG.step()

            # Output training stats
            if i % 50 == 0:
                print('[%d/%d][%d/%d]\tLoss_D: %.4f\tLoss_G: %.4f\tD(x): %.4f\tD(G(z)): %.4f / %.4f'
                      % (epoch, num_epochs, i, len(dataloader),
                         errD.item(), errG.item(), D_x, D_G_z1, D_G_z2))

            # Save Losses for plotting later
            G_losses.append(errG.item())
            D_losses.append(errD.item())

            # Check how the generator is doing by saving G's output on fixed_noise
            if (iters % 500 == 0) or ((epoch == num_epochs - 1) and (i == len(dataloader) - 1)):
                with torch.no_grad():
                    fake = netG(fixed_noise).detach().cpu()
                img_list.append(vutils.make_grid(fake, padding=2, normalize=True))

            iters += 1

    plt.figure(figsize=(10, 5))
    plt.title("Generator and Discriminator Loss During Training")
    plt.plot(G_losses, label="G")
    plt.plot(D_losses, label="D")
    plt.xlabel("iterations")
    plt.ylabel("Loss")
    plt.legend()
    plt.show()

    # %%capture
    fig = plt.figure(figsize=(8, 8))
    plt.axis("off")
    ims = [[plt.imshow(np.transpose(i, (1, 2, 0)), animated=True)] for i in img_list]
    ani = animation.ArtistAnimation(fig, ims, interval=1000, repeat_delay=1000, blit=True)

    HTML(ani.to_jshtml())

    # Grab a batch of real images from the dataloader
    real_batch = next(iter(dataloader))

    # Plot the real images
    plt.figure(figsize=(15, 15))
    plt.subplot(1, 2, 1)
    plt.axis("off")
    plt.title("Real Images")
    plt.imshow(
        np.transpose(vutils.make_grid(real_batch[0].to(device)[:64], padding=5, normalize=True).cpu(), (1, 2, 0)))

    # Plot the fake images from the last epoch
    plt.subplot(1, 2, 2)
    plt.axis("off")
    plt.title("Fake Images")
    plt.imshow(np.transpose(img_list[-1], (1, 2, 0)))
    plt.show()

注意：epoch比较小的话，效果比较差，D(G(z))比较小，D(x)占据上风，也就是生成的图片不足以骗过判别器。我们测试epoch=100，生成的图有一些人脸还是比较差。epoch=1000，效果也比较一般，GAN的生成还是比较玄学的。

epoch=100