一、为什么会有ResNet?
一个伟大的发明都在于他解决了一个痛点问题,而ResNet就是解决了深层网络难以训练的问题。在原著论文里称这个问题为degradation problem,即退化问题。何为退化问题?--随着网络层数的增加,训练的loss和测试的loss都不断增大。与overfiting不同的是,出现overfiting的信号是训练loss在不断减小,测试的loss在不断增加。如下图,对于20层和56层的网络结构来说,56层的训练loss和测试loss均比20层高。这里的网络结构属于普通网络结构,文中称之为“plain” networks。
为什么会出现退化问题呢?论文中没有给出明确的答案,我想这是和梯度在神经网络里的传播有关,虽然batchnorm能够挽救梯度消失和爆炸的问题,但对于太深的网络,batchnorm估计也是无力回天吧,未做数学证明。
二、ResNet做了什么?
ResNet做的事情很简单,但却像勾股定理一样简单而不凡。它仅仅是在普通网络结构上做了恒等映射,论文中称之为identity mapping。特意放上恒等映射的含义:
通俗的理解就是把x拿过来就行,管他做了啥映射。在ResNet中,恒等映射就是右侧的曲线箭头:
在普通网络结构中,是把weight layer一个个堆叠起来,目的是要拟合我们所需的。但是这样的关系由于各种各样的原因不太好拟合,所以ResNet就来拟合
,让堆叠的layer去拟合残差
,当残差趋于0时,便可以得到恒等映射
,当残差不为0时,便可以拟合新的特征。
你可能会注意到,在卷积神经网络中,输出的维度可能和X不一致,对此作者提出了两种处理方式:
一、当维度一致的时候,二者直接相加。
二、当维度不一致的时候,X做一次矩阵变换。
ResNet的网络结构:
相比于34-layer plain,ResNet加入了shortcut connections,当维度一致的时候,就是实线,当维度不一致的时候就是虚线,需要做维度变换或者padding。
三、ResNet的效果怎样?
左图是普通网络结构,18层的训练误差和测试误差几乎都比34层低,右图是ResNet结构,34层的训练误差和测试误差都比18层要低,完美地解决了退化问题。
ResNet18的结构
ResNet18之所以称为18,是因为有18层:conv1(1)+conv_x(2*2*4)+fc(1)=18,前面17层都是卷积层,最后一层是全连接层。
我们来看一下整个结构的维度是怎么变换的。首先,输入层是四维数据,分别是
batch_size:批次大小,图中简要为1;
channel:通道,图片是彩色的时候,多为RGB,3通道;
heigh:图片长度;
width:图片宽度;
然后,进行一次卷积池化操作,in_out是输入和输出通道,k是kernel,s是stride,p是padding。那么output的每一个维度是怎么计算的呢?
batch_size=1维持不变,仍然为1;
channel=64即cov1的输出通道;
对于卷积操作:
H=W=112 根据计算公式:[(H+2*p-k)/s ]+ 1,如果出现小数则向下取整,由此得112。
对于池化操作:
H=W=56根据计算公式:[(H-k)/s] + 1,如果出现小数则向上取整,由此得56。
接下来,便是conv2_x到conv5_x层,每个层的block都会重复两次,每次有2个卷积层,所以一个层有4次卷积操作。下图是第一层的具体操作,每两个卷积之间会有一次一致映射(indentity_map),即论文中的shortcut。
layer_2至layer_4的具体操作和layer_1大致相同,就是参数会有差别,每个层的输出维度如下:
经历17个卷积操作之后,接着是池化、flatten,最后接一个全连接层。
以下是resnet18的简要代码:
import torch
import torch.nn as nn
from torch.hub import load_state_dict_from_url
import ssl
ssl._create_default_https_context = ssl._create_unverified_context
model_urls = {
'resnet18': 'https://download.pytorch.org/models/resnet18-5c106cde.pth',
}
def conv3x3(in_planes, out_planes, stride=1, padding=1):
return nn.Conv2d(in_planes, out_planes, kernel_size=3, stride=stride, padding=padding, bias=False) # ? Why no bias
def conv1x1(in_planes, out_planes, stride=1):
return nn.Conv2d(in_planes, out_planes, kernel_size=1, stride=stride,
bias=False) # ? Why no bias: 如果卷积层之后是BN层,那么可以不用偏置参数,可以节省内存
class BasicBlock(nn.Module):
expansion = 1 # 经过Block之后channel的变化量
def __init__(self, inplanes, planes, stride=1, downsample=None, norm_layer=None):
# downsample: 调整维度一致之后才能相加
# norm_layer:batch normalization layer
super(BasicBlock, self).__init__()
if norm_layer is None:
norm_layer = nn.BatchNorm2d # 如果bn层没有自定义,就使用标准的bn层
self.conv1 = conv3x3(inplanes, planes, stride)
self.bn1 = norm_layer(planes)
self.relu = nn.ReLU(inplace=True)
self.conv2 = conv3x3(planes, planes)
self.bn2 = norm_layer(planes)
self.downsample = downsample
self.stride = stride
def forward(self, x):
identity = x # 保存x
out = self.conv1(x)
out = self.bn1(out)
out = self.relu(out)
out = self.conv2(out)
out = self.bn2(out)
if self.downsample is not None:
identity = self.downsample(x) # downsample调整x的维度,F(x)+x一致才能相加
out += identity
out = self.relu(out) # 先相加再激活
return out
class ResNet(nn.Module):
def __init__(self, block, layers, num_class=1000, norm_layer=None):
super(ResNet, self).__init__()
if norm_layer is None:
norm_layer = nn.BatchNorm2d
self._norm_layer = norm_layer
self.inplanes = 64
# conv1 in ppt figure
self.conv1 = nn.Conv2d(3, self.inplanes, kernel_size=7, stride=2, padding=3, bias=False)
self.bn1 = norm_layer(self.inplanes)
self.relu = nn.ReLU(inplace=True)
self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
self.layer1 = self._make_layer(block, 64, layers[0])
self.layer2 = self._make_layer(block, 128, layers[1], stride=2)
self.layer3 = self._make_layer(block, 256, layers[2], stride=2)
self.layer4 = self._make_layer(block, 512, layers[3], stride=2)
self.avgpool = nn.AdaptiveAvgPool2d((1, 1)) # (1,1)等于GAP
self.fc = nn.Linear(512 * block.expansion, num_class)
for m in self.modules():
if isinstance(m, nn.Conv2d):
nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
elif isinstance(m, (nn.BatchNorm2d, nn.GroupNorm)):
nn.init.constant_(m.weight, 1)
nn.init.constant_(m.bias, 0)
def _make_layer(self, block, planes, blocks, stride=1):
# 生成不同的stage/layer
# block: block type(basic block/bottle block)
# blocks: blocks的数量
norm_layer = self._norm_layer
downsample = None
if stride != 1 or self.inplanes != planes * block.expansion:
# 需要调整维度
downsample = nn.Sequential(
conv1x1(self.inplanes, planes * block.expansion, stride), # 同时调整spatial(H x W))和channel两个方向
norm_layer(planes * block.expansion)
)
layers = []
layers.append(block(self.inplanes, planes, stride, downsample, norm_layer)) # 第一个block单独处理
self.inplanes = planes * block.expansion # 记录layerN的channel变化,具体请看ppt resnet表格
for _ in range(1, blocks): # 从1开始循环,因为第一个模块前面已经单独处理
layers.append(block(self.inplanes, planes, norm_layer=norm_layer))
# 列表前面加星号作用是将列表解开成两个独立的参数,传入函数
return nn.Sequential(*layers) # 使用Sequential层组合blocks,形成stage。
def forward(self, x):
# x.shape = [1, 3, 224, 224]
x = self.conv1(x)
# x.shape = [1, 64, 112, 112]
x = self.bn1(x)
x = self.relu(x)
x = self.maxpool(x)
# x.shape = [1, 64, 56, 56]
x = self.layer1(x)
# x.shape = [1, 64, 56, 56]
x = self.layer2(x)
# x.shape = [1, 128, 28, 28]
x = self.layer3(x)
# x.shape = [1, 256, 14, 14]
x = self.layer4(x)
# x.shape = [1, 512, 7, 7]
x = self.avgpool(x)
# x.shape = [1, 512, 1, 1]
x = torch.flatten(x, 1)
# x.shape = [1, 512]
x = self.fc(x)
# x.shape = [1, 1000]
return x
def _resnet(arch, block, layers, pretrained, progress, **kwargs):
model = ResNet(block, layers, **kwargs)
if pretrained:
print('***', model_urls[arch])
state_dict = load_state_dict_from_url(model_urls[arch], progress=progress)
model.load_state_dict(state_dict)
return model
def resnet18(pretrained=False, progress=True, **kwargs):
r"""ResNet-18 model from
`"Deep Residual Learning for Image Recognition" <https://arxiv.org/pdf/1512.03385.pdf>`_
Args:
pretrained (bool): If True, returns a model pre-trained on ImageNet
progress (bool): If True, displays a progress bar of the download to stderr
"""
return _resnet('resnet18', BasicBlock, [2, 2, 2, 2], pretrained, progress,
**kwargs)
model = resnet18(pretrained=True)
model.eval()
import torch
model2 = torch.hub.load('pytorch/vision:v0.4.2', 'resnet18', pretrained=True)
model2.eval()
model.state_dict()
model2.state_dict()
# Download an example image from the pytorch website
import urllib
url, filename = ("https://upload-images.jianshu.io/upload_images/21240497-b656d52c99d3b17b.jpg", "dog.jpg")
try:
urllib.URLopener().retrieve(url, filename)
except:
urllib.request.urlretrieve(url, filename)
# sample execution (requires torchvision)
from PIL import Image
from torchvision import transforms
input_image = Image.open(filename) # mode=RGB size=350x35
print(input_image)
preprocess = transforms.Compose([
transforms.Resize(256),
transforms.CenterCrop(224),
transforms.ToTensor(),
transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])
input_tensor = preprocess(input_image)
print('input_tensor: ', input_tensor.shape)
input_batch = input_tensor.unsqueeze(0) # create a mini-batch as expected by the model
print('add dim of batch: ', input_batch.shape)
# move the input and model to GPU for speed if available
if torch.cuda.is_available():
input_batch = input_batch.to('cuda')
model.to('cuda')
with torch.no_grad():
output = model(input_batch)
print('output:',output.shape)
result = torch.nn.functional.softmax(output[0], dim=0)
result.argmax()
input_image