PyTorch 实现MobileNetV1用于图像分类

本实验主要介绍了如何在昇腾上，使用pytorch对经典的MobileNetV1模型在公开的CIFAR10数据集进行分类训练的实战讲解。内容包括MobileNetV1模型特点介绍 、MobileNetV1网络架构剖析 与MobileNetV1网络模型代码实战分析等等。

本实验的目录结构安排如下所示：

* MobileNetV1网络模型介绍

* MobileNetV1的网络架构剖析

* 深度可分离卷积

* MobileNetV1网络模型代码实现分析

* MobileNetV1网络用于cifar数据集分类实战

MobileNetV1网络模型介绍

卷积神经网络（CNN）被普遍应用在计算机视觉领域，并且已经取得了不错的效果。为了追求分类准确度，模型深度越来越深，模型复杂度也越来越高，如深度残差网络（ResNet）其层数已经多达152层。然而，在某些真实的应用场景如移动或者嵌入式设备，如此大而复杂的模型是难以被应用的。

首先是模型过于庞大，面临着内存不足的问题，其次这些场景要求低延迟，或者说响应速度要快，想象一下自动驾驶汽车的行人检测系统如果速度很慢会发生什么可怕的事情。所以，研究小而高效的CNN模型在这些场景至关重要。

MobileNetV1网络专注于移动端或者嵌入式设备中的轻量级CNN网络，该论文最大的创新点是，使用深度可分离卷积（深度卷积+点卷积）代替标准卷积来显著降低参数量的同时小幅度降低精度。

MobileNetV1的网络架构剖析

深度可分离卷积

MobileNet的基本单元是深度可分离卷积（depthwise separable convolution）,下图中右侧部分，深度可分离卷积其实是一种可分解卷积操作，其可以分解为两个更小的操作：depthwise convolution和pointwise convolution。

Depthwise convolution和标准卷积不同，对于标准卷积其卷积核是作用在所有的输入通道上，而depthwise convolution针对每个输入通道采用不同的卷积核，就是说一个卷积核对应一个通道，所以说depthwise convolution是depth级别的操作。而pointwise convolution其实就是普通的卷积，只不过其采用1x1的卷积核。

深度可分离卷积为什么可以减少参数量

一个输入shape为FxFXM与卷积核大小为KxKxN执行一次卷积操作需要的乘法计算量：(KxK)xMxNxFxF。

常规卷积所需要的乘法计算量是：2x2x3x8x5x5 = 2400

深度可分离卷积所需要的乘法计算量：2x2x3x5x5 + 3x8x5x5 = 900

import torch

import torch.nn as nn

# 深度可分离卷积 DSC, 深度卷积 Depthwise + 逐点卷积 Pointwise

class DSCconv(nn.Module):

def __init__(self, in_ch, out_ch, stride=1):

super(DSCconv, self).__init__()

# 深度卷积, (DW+BN+ReLU)

self.depthConv = nn.Sequential(

nn.Conv2d(in_ch, in_ch, kernel_size=3, stride=stride,

padding=1, groups=in_ch, bias=False),

nn.BatchNorm2d(in_ch),

nn.ReLU6(inplace=True))

# 逐点卷积, (PW+BN+ReLU)

self.pointConv = nn.Sequential(

nn.Conv2d(in_ch, out_ch, kernel_size=1, stride=1, bias=False),

nn.BatchNorm2d(out_ch),

nn.ReLU6(inplace=True))

def forward(self, x):

x = self.depthConv(x)

x = self.pointConv(x)

return x

MobileNetV1网络代码实现分析

MobileNetV1简单的直筒型网络一条道走到黑，没有shortcut，唯一的黑科技也只有深度分离卷积了，但是这个也不是啥首创，Xception就用过这种技术了，之所以MobileNetV1能发论文，因为首次有人关注到Mobile AI领域，让Mobile能够流畅的运行一些简单的AI模型。

整个MobileNetV1网络用一个MobileNetV1类实现，'self.stage1'、'self.stage2'与'self.stage3'分别对应图中红框内模块，'self.avgpool'、self.linear与'self.softmax'与图中最后三个输出层相对应。

```python

class MobileNetV1(nn.Module):

def __init__(self, input_channel=3, num_classes=100):

super(MobileNetV1, self).__init__()

self.num_classes = num_classes

self.entry = nn.Sequential(nn.Conv2d(input_channel, 32, kernel_size=3,

stride=1, padding=1, bias=False),

nn.BatchNorm2d(32),

nn.ReLU6(inplace=True))

self.stage1 = nn.Sequential(

DSCconv(32, 64, 1),

DSCconv(64, 128, 2),

DSCconv(128, 128, 1),

DSCconv(128, 256, 2),

DSCconv(256, 256, 1))

# 2

self.stage2 = nn.Sequential(

DSCconv(256, 512, 2),

DSCconv(512, 512, 1),

DSCconv(512, 512, 1),

DSCconv(512, 512, 1),

DSCconv(512, 512, 1),

DSCconv(512, 512, 1))

# 3

self.stage3 = nn.Sequential(

DSCconv(512, 1024, 2),

DSCconv(1024, 1024, 1))

self.avgpool = nn.AdaptiveAvgPool2d((1,1)) # torch.Size([batch, 1024, 1, 1])

self.linear = nn.Linear(in_features=1024, out_features=num_classes)

self.softmax = nn.Softmax(dim=1)

def forward(self, x):

x = self.entry(x)

x = self.stage1(x)

x = self.stage2(x)

x = self.stage3(x)

x = self.avgpool(x)

x = x.view(x.size(0), -1)

x = self.linear(x)

out = self.softmax(x)

return out

MobileNetV1网络用于cifar数据集分类实战

基于上述搭建好的网络模型，我们现在就可以正式来使用该模型开始训练cifir数据集。

导入昇腾npu相关库transfer_to_npu、该模块可以使能模型自动迁移至昇腾上。

import torch_npu

from torch_npu.contrib import transfer_to_npu

torchvision模块中集成了一些当今比较流行的数据集、模型架构和用于计算机视觉的常见图像转换功能，torchvision模块中含有本次实验所需要的CIFAR数据集，因此导入该模块用于数据集的下载。tqdm是用于训练过程中训练进度条，便于我们能够清晰的看到整个训练过程。

import torchvision

import torchvision.transforms as transforms

from tqdm import tqdm

数据集预处理功能定义： 对图像数据集进行不同程度的变化，包括裁剪、翻转等方式增加数据的多样性，防止过拟合现象的出现，以增强模型的泛化能力。

调用了torchvision中的transform库中的compose方法，使用裁剪（RandomCrop）、翻转（RandomHorizontalFlip）等组合成tensor形式后并对tensor进行正则化（Normalize）。

transform_train = transforms.Compose([

transforms.RandomCrop(32, padding=4),

transforms.RandomHorizontalFlip(),

transforms.ToTensor(),

transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)),

])

transform_test = transforms.Compose([

transforms.ToTensor(),

transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)),

])

cifar数据集共有60000张彩色图像，这些图像是32\*32，分为10个类，每类6000张图。有50000张用于训练，构成了5个训练批，每一批10000张图；另外10000用于测试，单独构成一批。测试批的数据里，取自10类中的每一类，每一类随机取1000张。抽剩下的就随机排列组成了训练批。注意一个训练批中的各类图像并不一定数量相同，总的来看训练批，每一类都有5000张图。

数据集加载： torchvision中集成了一些通用的开源数据集，其中也包含cifar，此处通过torchvision函数加载cifar数据集到工作目录上的指定路径，如果已经下载好了，会直接校验通过，不会二次进行下载。

```python

trainset = torchvision.datasets.CIFAR10(root='./dataset/cifar-10-batches-py', train=True, download=True, transform=transform_train)

trainloader = torch.utils.data.DataLoader(trainset, batch_size=128, shuffle=True)

testset = torchvision.datasets.CIFAR10(root='./dataset/cifar-10-batches-py', train=False, download=True, transform=transform_test)

testloader = torch.utils.data.DataLoader( testset, batch_size=100, shuffle=False)

classes = ('plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck')

训练模块: 根据传入的迭代次数'epoch'开始训练网络模型，这里需要在model开始前加入'net.train()'，使用随机梯度下降算法是将梯度值初始化为0（'zero_grad()'），计算梯度、通过梯度下降算法更新模型参数的值以及统计每次训练后的loss值（每隔100次打印一次）。

def train(epoch):

net.train()

train_loss = 0.0

epoch_loss = 0.0

for batch_idx, (inputs, targets) in enumerate(tqdm(trainloader, 0)):

inputs, targets = inputs.to(device), targets.to(device)

optimizer.zero_grad()

outputs = net(inputs)

loss = criterion(outputs, targets)

loss.backward()

optimizer.step()

lr_scheduler.step()

train_loss += loss.item()

epoch_loss += loss.item()

if batch_idx % 100 == 99: # 每100次迭代打印一次损失

print(f'\[Epoch {epoch + 1}, Iteration {batch\_idx + 1}\] loss: {train\_loss / 100:.3f}')

train_loss = 0.0

return epoch_loss / len(trainloader)

测试模块: 每训练一轮将会对最新得到的训练模型效果进行测试，使用的是数据集准备时期划分得到的测试集，每类约为1000张。

def test():

net.eval()

test_loss = 0

correct = 0

total = 0

with torch.no_grad():

for batch_idx, (inputs, targets) in enumerate(tqdm(testloader)):

inputs, targets = inputs.to(device), targets.to(device)

outputs = net(inputs)

loss = criterion(outputs, targets)

test_loss += loss.item()

_, predicted = outputs.max(1)

total += targets.size(0)

correct += predicted.eq(targets).sum().item()

return 100 * correct / total

主功能调用模块： 该模块用于开启模型在指定数据集（cifar）上训练，其中定义了硬件设备为昇腾npu（device = 'npu'），定义了损失函数为交叉熵损失'CrossEntropyLoss()'，梯度下降优化算法为SGD并同时指定了学习率等参数。

import torch.optim as optim

device = 'npu'

net = MobileNetV1(num_classes=10)

net = net.to(device)

criterion = nn.CrossEntropyLoss()

optimizer = optim.SGD(net.parameters(), lr=1.0, weight_decay=5e-4)

lr_scheduler = torch.optim.lr_scheduler.OneCycleLR(optimizer,0.1,steps_per_epoch=len(trainloader), epochs=150,div_factor=25,final_div_factor=10000,pct_start=0.3)

训练与测试的次数为60次，这里用户可以根据需要自行选择设置更高或更低，每个epoch的测试准确率都会被打印出来，如果不需要将代码注释掉即可。

for epoch in range(60):

epoch_loss = train(epoch)

test_accuray = test()

print(f'\\nTest accuracy for AlexNet at epoch {epoch + 1}: {test_accuray:.2f}%')

print(f'Epoch loss for AlexNet at epoch {epoch + 1}: {epoch_loss:.3f}')

Reference

[1] Howard A G. Mobilenets: Efficient convolutional neural networks for mobile vision applications[J]. arXiv preprint arXiv:1704.04861, 2017.

PyTorch 实现MobileNetV1用于图像分类