ResNet论文和代码解读

一、为什么会有ResNet?

一个伟大的发明都在于他解决了一个痛点问题,而ResNet就是解决了深层网络难以训练的问题。在原著论文里称这个问题为degradation problem,即退化问题。何为退化问题?--随着网络层数的增加,训练的loss和测试的loss都不断增大。与overfiting不同的是,出现overfiting的信号是训练loss在不断减小,测试的loss在不断增加。如下图,对于20层和56层的网络结构来说,56层的训练loss和测试loss均比20层高。这里的网络结构属于普通网络结构,文中称之为“plain” networks。

为什么会出现退化问题呢?论文中没有给出明确的答案,我想这是和梯度在神经网络里的传播有关,虽然batchnorm能够挽救梯度消失和爆炸的问题,但对于太深的网络,batchnorm估计也是无力回天吧,未做数学证明。

二、ResNet做了什么?

ResNet做的事情很简单,但却像勾股定理一样简单而不凡。它仅仅是在普通网络结构上做了恒等映射,论文中称之为identity mapping。特意放上恒等映射的含义:

通俗的理解就是把x拿过来就行,管他做了啥映射。在ResNet中,恒等映射就是右侧的曲线箭头:

在普通网络结构中,是把weight layer一个个堆叠起来,目的是要拟合我们所需的H(x)。但是这样的关系由于各种各样的原因不太好拟合,所以ResNet就来拟合F(X)=H(x)-x,让堆叠的layer去拟合残差F(X),当残差趋于0时,便可以得到恒等映射H(x)=x,当残差不为0时,便可以拟合新的特征。
你可能会注意到,在卷积神经网络中,F(x)输出的维度可能和X不一致,对此作者提出了两种处理方式:
一、当维度一致的时候,二者直接相加。

二、当维度不一致的时候,X做一次矩阵变换。

ResNet的网络结构:

相比于34-layer plain,ResNet加入了shortcut connections,当维度一致的时候,就是实线,当维度不一致的时候就是虚线,需要做维度变换或者padding。

三、ResNet的效果怎样?

左图是普通网络结构,18层的训练误差和测试误差几乎都比34层低,右图是ResNet结构,34层的训练误差和测试误差都比18层要低,完美地解决了退化问题。

ResNet18的结构

ResNet18之所以称为18,是因为有18层:conv1(1)+conv_x(2*2*4)+fc(1)=18,前面17层都是卷积层,最后一层是全连接层。

我们来看一下整个结构的维度是怎么变换的。首先,输入层是四维数据,分别是
batch_size:批次大小,图中简要为1;
channel:通道,图片是彩色的时候,多为RGB,3通道;
heigh:图片长度;
width:图片宽度;

然后,进行一次卷积池化操作,in_out是输入和输出通道,k是kernel,s是stride,p是padding。那么output的每一个维度是怎么计算的呢?
batch_size=1维持不变,仍然为1;
channel=64即cov1的输出通道;
对于卷积操作:
H=W=112 根据计算公式:[(H+2*p-k)/s ]+ 1,如果出现小数则向下取整,由此得112。
对于池化操作:
H=W=56根据计算公式:[(H-k)/s] + 1,如果出现小数则向上取整,由此得56。

接下来,便是conv2_x到conv5_x层,每个层的block都会重复两次,每次有2个卷积层,所以一个层有4次卷积操作。下图是第一层的具体操作,每两个卷积之间会有一次一致映射(indentity_map),即论文中的shortcut。

layer_2至layer_4的具体操作和layer_1大致相同,就是参数会有差别,每个层的输出维度如下:

经历17个卷积操作之后,接着是池化、flatten,最后接一个全连接层。

以下是resnet18的简要代码:

import torch
import torch.nn as nn
from torch.hub import load_state_dict_from_url
import ssl

ssl._create_default_https_context = ssl._create_unverified_context

model_urls = {
    'resnet18': 'https://download.pytorch.org/models/resnet18-5c106cde.pth',
}


def conv3x3(in_planes, out_planes, stride=1, padding=1):
    return nn.Conv2d(in_planes, out_planes, kernel_size=3, stride=stride, padding=padding, bias=False)  # ? Why no bias


def conv1x1(in_planes, out_planes, stride=1):
    return nn.Conv2d(in_planes, out_planes, kernel_size=1, stride=stride,
                     bias=False)  # ? Why no bias: 如果卷积层之后是BN层,那么可以不用偏置参数,可以节省内存


class BasicBlock(nn.Module):
    expansion = 1  # 经过Block之后channel的变化量

    def __init__(self, inplanes, planes, stride=1, downsample=None, norm_layer=None):
        # downsample: 调整维度一致之后才能相加
        # norm_layer:batch normalization layer
        super(BasicBlock, self).__init__()
        if norm_layer is None:
            norm_layer = nn.BatchNorm2d  # 如果bn层没有自定义,就使用标准的bn层
        self.conv1 = conv3x3(inplanes, planes, stride)
        self.bn1 = norm_layer(planes)
        self.relu = nn.ReLU(inplace=True)
        self.conv2 = conv3x3(planes, planes)
        self.bn2 = norm_layer(planes)
        self.downsample = downsample
        self.stride = stride

    def forward(self, x):
        identity = x  # 保存x

        out = self.conv1(x)
        out = self.bn1(out)
        out = self.relu(out)
        out = self.conv2(out)
        out = self.bn2(out)

        if self.downsample is not None:
            identity = self.downsample(x)  # downsample调整x的维度,F(x)+x一致才能相加

        out += identity
        out = self.relu(out)  # 先相加再激活

        return out


class ResNet(nn.Module):
    def __init__(self, block, layers, num_class=1000, norm_layer=None):
        super(ResNet, self).__init__()
        if norm_layer is None:
            norm_layer = nn.BatchNorm2d
        self._norm_layer = norm_layer

        self.inplanes = 64

        # conv1 in ppt figure
        self.conv1 = nn.Conv2d(3, self.inplanes, kernel_size=7, stride=2, padding=3, bias=False)
        self.bn1 = norm_layer(self.inplanes)
        self.relu = nn.ReLU(inplace=True)
        self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
        self.layer1 = self._make_layer(block, 64, layers[0])
        self.layer2 = self._make_layer(block, 128, layers[1], stride=2)
        self.layer3 = self._make_layer(block, 256, layers[2], stride=2)
        self.layer4 = self._make_layer(block, 512, layers[3], stride=2)
        self.avgpool = nn.AdaptiveAvgPool2d((1, 1))  # (1,1)等于GAP
        self.fc = nn.Linear(512 * block.expansion, num_class)

        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
            elif isinstance(m, (nn.BatchNorm2d, nn.GroupNorm)):
                nn.init.constant_(m.weight, 1)
                nn.init.constant_(m.bias, 0)

    def _make_layer(self, block, planes, blocks, stride=1):
        # 生成不同的stage/layer
        # block: block type(basic block/bottle block)
        # blocks: blocks的数量
        norm_layer = self._norm_layer
        downsample = None

        if stride != 1 or self.inplanes != planes * block.expansion:
            # 需要调整维度
            downsample = nn.Sequential(
                conv1x1(self.inplanes, planes * block.expansion, stride),  # 同时调整spatial(H x W))和channel两个方向
                norm_layer(planes * block.expansion)
            )
        layers = []
        layers.append(block(self.inplanes, planes, stride, downsample, norm_layer))  # 第一个block单独处理
        self.inplanes = planes * block.expansion  # 记录layerN的channel变化,具体请看ppt resnet表格
        for _ in range(1, blocks):  # 从1开始循环,因为第一个模块前面已经单独处理
            layers.append(block(self.inplanes, planes, norm_layer=norm_layer))
            # 列表前面加星号作用是将列表解开成两个独立的参数,传入函数
        return nn.Sequential(*layers)  # 使用Sequential层组合blocks,形成stage。

    def forward(self, x):
        # x.shape = [1, 3, 224, 224]
        x = self.conv1(x)
        # x.shape = [1, 64, 112, 112]
        x = self.bn1(x)
        x = self.relu(x)
        x = self.maxpool(x)
        # x.shape = [1, 64, 56, 56]

        x = self.layer1(x)
        # x.shape = [1, 64, 56, 56]
        x = self.layer2(x)
        # x.shape = [1, 128, 28, 28]
        x = self.layer3(x)
        # x.shape = [1, 256, 14, 14]
        x = self.layer4(x)
        # x.shape = [1, 512, 7, 7]

        x = self.avgpool(x)
        # x.shape = [1, 512, 1, 1]
        x = torch.flatten(x, 1)
        # x.shape = [1, 512]
        x = self.fc(x)
        # x.shape = [1, 1000]

        return x


def _resnet(arch, block, layers, pretrained, progress, **kwargs):
    model = ResNet(block, layers, **kwargs)
    if pretrained:
        print('***', model_urls[arch])
        state_dict = load_state_dict_from_url(model_urls[arch], progress=progress)
        model.load_state_dict(state_dict)
    return model


def resnet18(pretrained=False, progress=True, **kwargs):
    r"""ResNet-18 model from
    `"Deep Residual Learning for Image Recognition" <https://arxiv.org/pdf/1512.03385.pdf>`_
    Args:
        pretrained (bool): If True, returns a model pre-trained on ImageNet
        progress (bool): If True, displays a progress bar of the download to stderr
    """
    return _resnet('resnet18', BasicBlock, [2, 2, 2, 2], pretrained, progress,
                   **kwargs)



model = resnet18(pretrained=True)
model.eval()

import torch

model2 = torch.hub.load('pytorch/vision:v0.4.2', 'resnet18', pretrained=True)
model2.eval()

model.state_dict()

model2.state_dict()

# Download an example image from the pytorch website
import urllib

url, filename = ("https://upload-images.jianshu.io/upload_images/21240497-b656d52c99d3b17b.jpg", "dog.jpg")
try:
    urllib.URLopener().retrieve(url, filename)
except:
    urllib.request.urlretrieve(url, filename)

# sample execution (requires torchvision)
from PIL import Image
from torchvision import transforms

input_image = Image.open(filename)  # mode=RGB size=350x35
print(input_image)
preprocess = transforms.Compose([
    transforms.Resize(256),
    transforms.CenterCrop(224),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])
input_tensor = preprocess(input_image)
print('input_tensor: ', input_tensor.shape)
input_batch = input_tensor.unsqueeze(0)  # create a mini-batch as expected by the model
print('add dim of batch: ', input_batch.shape)

# move the input and model to GPU for speed if available
if torch.cuda.is_available():
    input_batch = input_batch.to('cuda')
    model.to('cuda')

with torch.no_grad():
    output = model(input_batch)
print('output:',output.shape)
result = torch.nn.functional.softmax(output[0], dim=0)

result.argmax()

input_image
最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
平台声明:文章内容(如有图片或视频亦包括在内)由作者上传并发布,文章内容仅代表作者本人观点,简书系信息发布平台,仅提供信息存储服务。

推荐阅读更多精彩内容