ResNet论文和代码解读

一、为什么会有ResNet？

一个伟大的发明都在于他解决了一个痛点问题，而ResNet就是解决了深层网络难以训练的问题。在原著论文里称这个问题为degradation problem，即退化问题。何为退化问题？--随着网络层数的增加，训练的loss和测试的loss都不断增大。与overfiting不同的是，出现overfiting的信号是训练loss在不断减小，测试的loss在不断增加。如下图，对于20层和56层的网络结构来说，56层的训练loss和测试loss均比20层高。这里的网络结构属于普通网络结构，文中称之为“plain” networks。

为什么会出现退化问题呢？论文中没有给出明确的答案，我想这是和梯度在神经网络里的传播有关，虽然batchnorm能够挽救梯度消失和爆炸的问题，但对于太深的网络，batchnorm估计也是无力回天吧，未做数学证明。

二、ResNet做了什么？

ResNet做的事情很简单，但却像勾股定理一样简单而不凡。它仅仅是在普通网络结构上做了恒等映射，论文中称之为identity mapping。特意放上恒等映射的含义：

通俗的理解就是把x拿过来就行，管他做了啥映射。在ResNet中，恒等映射就是右侧的曲线箭头：

在普通网络结构中，是把weight layer一个个堆叠起来，目的是要拟合我们所需的 $H(x)$ 。但是这样的关系由于各种各样的原因不太好拟合，所以ResNet就来拟合 $F(X)=H(x)-x$ ,让堆叠的layer去拟合残差 $F(X)$ ，当残差趋于0时，便可以得到恒等映射 $H(x)=x$ ，当残差不为0时，便可以拟合新的特征。
你可能会注意到，在卷积神经网络中， $F(x)$ 输出的维度可能和X不一致，对此作者提出了两种处理方式：
一、当维度一致的时候，二者直接相加。

二、当维度不一致的时候，X做一次矩阵变换。

ResNet的网络结构：

相比于34-layer plain，ResNet加入了shortcut connections，当维度一致的时候，就是实线，当维度不一致的时候就是虚线，需要做维度变换或者padding。

三、ResNet的效果怎样？

左图是普通网络结构，18层的训练误差和测试误差几乎都比34层低，右图是ResNet结构，34层的训练误差和测试误差都比18层要低，完美地解决了退化问题。

ResNet18的结构

ResNet18之所以称为18，是因为有18层：conv1(1)+conv_x(2*2*4)+fc(1)=18,前面17层都是卷积层，最后一层是全连接层。

我们来看一下整个结构的维度是怎么变换的。首先，输入层是四维数据，分别是
batch_size:批次大小，图中简要为1;
channel:通道，图片是彩色的时候，多为RGB，3通道;
heigh:图片长度;
width:图片宽度;

然后，进行一次卷积池化操作，in_out是输入和输出通道，k是kernel，s是stride，p是padding。那么output的每一个维度是怎么计算的呢？
batch_size=1维持不变，仍然为1；
channel=64即cov1的输出通道；
对于卷积操作：
H=W=112 根据计算公式：[(H+2*p-k)/s ]+ 1，如果出现小数则向下取整，由此得112。
对于池化操作：
H=W=56根据计算公式：[(H-k)/s] + 1，如果出现小数则向上取整，由此得56。

接下来，便是conv2_x到conv5_x层，每个层的block都会重复两次，每次有2个卷积层，所以一个层有4次卷积操作。下图是第一层的具体操作，每两个卷积之间会有一次一致映射(indentity_map)，即论文中的shortcut。

layer_2至layer_4的具体操作和layer_1大致相同，就是参数会有差别，每个层的输出维度如下：

经历17个卷积操作之后，接着是池化、flatten，最后接一个全连接层。

以下是resnet18的简要代码：

import torch
import torch.nn as nn
from torch.hub import load_state_dict_from_url
import ssl

ssl._create_default_https_context = ssl._create_unverified_context

model_urls = {
    'resnet18': 'https://download.pytorch.org/models/resnet18-5c106cde.pth',
}


def conv3x3(in_planes, out_planes, stride=1, padding=1):
    return nn.Conv2d(in_planes, out_planes, kernel_size=3, stride=stride, padding=padding, bias=False)  # ? Why no bias


def conv1x1(in_planes, out_planes, stride=1):
    return nn.Conv2d(in_planes, out_planes, kernel_size=1, stride=stride,
                     bias=False)  # ? Why no bias: 如果卷积层之后是BN层，那么可以不用偏置参数，可以节省内存


class BasicBlock(nn.Module):
    expansion = 1  # 经过Block之后channel的变化量

    def __init__(self, inplanes, planes, stride=1, downsample=None, norm_layer=None):
        # downsample: 调整维度一致之后才能相加
        # norm_layer：batch normalization layer
        super(BasicBlock, self).__init__()
        if norm_layer is None:
            norm_layer = nn.BatchNorm2d  # 如果bn层没有自定义，就使用标准的bn层
        self.conv1 = conv3x3(inplanes, planes, stride)
        self.bn1 = norm_layer(planes)
        self.relu = nn.ReLU(inplace=True)
        self.conv2 = conv3x3(planes, planes)
        self.bn2 = norm_layer(planes)
        self.downsample = downsample
        self.stride = stride

    def forward(self, x):
        identity = x  # 保存x

        out = self.conv1(x)
        out = self.bn1(out)
        out = self.relu(out)
        out = self.conv2(out)
        out = self.bn2(out)

        if self.downsample is not None:
            identity = self.downsample(x)  # downsample调整x的维度，F(x)+x一致才能相加

        out += identity
        out = self.relu(out)  # 先相加再激活

        return out


class ResNet(nn.Module):
    def __init__(self, block, layers, num_class=1000, norm_layer=None):
        super(ResNet, self).__init__()
        if norm_layer is None:
            norm_layer = nn.BatchNorm2d
        self._norm_layer = norm_layer

        self.inplanes = 64

        # conv1 in ppt figure
        self.conv1 = nn.Conv2d(3, self.inplanes, kernel_size=7, stride=2, padding=3, bias=False)
        self.bn1 = norm_layer(self.inplanes)
        self.relu = nn.ReLU(inplace=True)
        self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
        self.layer1 = self._make_layer(block, 64, layers[0])
        self.layer2 = self._make_layer(block, 128, layers[1], stride=2)
        self.layer3 = self._make_layer(block, 256, layers[2], stride=2)
        self.layer4 = self._make_layer(block, 512, layers[3], stride=2)
        self.avgpool = nn.AdaptiveAvgPool2d((1, 1))  # (1,1)等于GAP
        self.fc = nn.Linear(512 * block.expansion, num_class)

        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
            elif isinstance(m, (nn.BatchNorm2d, nn.GroupNorm)):
                nn.init.constant_(m.weight, 1)
                nn.init.constant_(m.bias, 0)

    def _make_layer(self, block, planes, blocks, stride=1):
        # 生成不同的stage/layer
        # block: block type(basic block/bottle block)
        # blocks: blocks的数量
        norm_layer = self._norm_layer
        downsample = None

        if stride != 1 or self.inplanes != planes * block.expansion:
            # 需要调整维度
            downsample = nn.Sequential(
                conv1x1(self.inplanes, planes * block.expansion, stride),  # 同时调整spatial(H x W))和channel两个方向
                norm_layer(planes * block.expansion)
            )
        layers = []
        layers.append(block(self.inplanes, planes, stride, downsample, norm_layer))  # 第一个block单独处理
        self.inplanes = planes * block.expansion  # 记录layerN的channel变化，具体请看ppt resnet表格
        for _ in range(1, blocks):  # 从1开始循环，因为第一个模块前面已经单独处理
            layers.append(block(self.inplanes, planes, norm_layer=norm_layer))
            # 列表前面加星号作用是将列表解开成两个独立的参数，传入函数
        return nn.Sequential(*layers)  # 使用Sequential层组合blocks，形成stage。

    def forward(self, x):
        # x.shape = [1, 3, 224, 224]
        x = self.conv1(x)
        # x.shape = [1, 64, 112, 112]
        x = self.bn1(x)
        x = self.relu(x)
        x = self.maxpool(x)
        # x.shape = [1, 64, 56, 56]

        x = self.layer1(x)
        # x.shape = [1, 64, 56, 56]
        x = self.layer2(x)
        # x.shape = [1, 128, 28, 28]
        x = self.layer3(x)
        # x.shape = [1, 256, 14, 14]
        x = self.layer4(x)
        # x.shape = [1, 512, 7, 7]

        x = self.avgpool(x)
        # x.shape = [1, 512, 1, 1]
        x = torch.flatten(x, 1)
        # x.shape = [1, 512]
        x = self.fc(x)
        # x.shape = [1, 1000]

        return x


def _resnet(arch, block, layers, pretrained, progress, **kwargs):
    model = ResNet(block, layers, **kwargs)
    if pretrained:
        print('***', model_urls[arch])
        state_dict = load_state_dict_from_url(model_urls[arch], progress=progress)
        model.load_state_dict(state_dict)
    return model


def resnet18(pretrained=False, progress=True, **kwargs):
    r"""ResNet-18 model from
    `"Deep Residual Learning for Image Recognition" <https://arxiv.org/pdf/1512.03385.pdf>`_
    Args:
        pretrained (bool): If True, returns a model pre-trained on ImageNet
        progress (bool): If True, displays a progress bar of the download to stderr
    """
    return _resnet('resnet18', BasicBlock, [2, 2, 2, 2], pretrained, progress,
                   **kwargs)



model = resnet18(pretrained=True)
model.eval()

import torch

model2 = torch.hub.load('pytorch/vision:v0.4.2', 'resnet18', pretrained=True)
model2.eval()

model.state_dict()

model2.state_dict()

# Download an example image from the pytorch website
import urllib

url, filename = ("https://upload-images.jianshu.io/upload_images/21240497-b656d52c99d3b17b.jpg", "dog.jpg")
try:
    urllib.URLopener().retrieve(url, filename)
except:
    urllib.request.urlretrieve(url, filename)

# sample execution (requires torchvision)
from PIL import Image
from torchvision import transforms

input_image = Image.open(filename)  # mode=RGB size=350x35
print(input_image)
preprocess = transforms.Compose([
    transforms.Resize(256),
    transforms.CenterCrop(224),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])
input_tensor = preprocess(input_image)
print('input_tensor: ', input_tensor.shape)
input_batch = input_tensor.unsqueeze(0)  # create a mini-batch as expected by the model
print('add dim of batch: ', input_batch.shape)

# move the input and model to GPU for speed if available
if torch.cuda.is_available():
    input_batch = input_batch.to('cuda')
    model.to('cuda')

with torch.no_grad():
    output = model(input_batch)
print('output:',output.shape)
result = torch.nn.functional.softmax(output[0], dim=0)

result.argmax()

input_image

ResNet论文和代码解读

一、为什么会有ResNet？

二、ResNet做了什么？

三、ResNet的效果怎样？

ResNet18的结构

推荐阅读更多精彩内容