GoogLeNet有意思的是设计了辅助分类器,还有一个就是卷积拆分,这个在VGG中实际已经提出,这个主题梳理了GoogLeNet的分类技术,顺便实现了多辅助分类器的训练,效果在ImageNet2012数据集上还不错(我只是用4类没有打乱的图像训练)。
要想跟上最新的技术,现在看书不顶用了,得看最新学术论文。
关于GoogLeNet网络
-
2014年,GoogLeNet和VGG是当年ImageNet挑战赛(ILSVRC14)的双雄,GoogLeNet获得了第一名、VGG获得了第二名,这两类模型结构的共同特点是层次更深了。
- VGG继承了LeNet以及AlexNet的一些框架结构,而GoogLeNet则做了更加大胆的网络结构尝试
- 内存或计算资源有限时,GoogleNet是比较好的选择;从模型结果来看,GoogLeNet的性能却更加优越
- GoogLeNet虽然深度只有22层,但大小却比AlexNet和VGG小很多;
- GoogleNet参数为500万个,AlexNet参数个数是GoogleNet的12倍,VGGNet参数又是AlexNet的3倍。
GoogLeNet是谷歌(Google)研究出来的深度网络结构,为什么不叫"GoogleNet",而叫"GoogLeNet",据说是为了向"LeNet"致敬,因此取名为"GoogLeNet"
-
参考资料:
https://my.oschina.net/u/876354/blog/1637819
GoogLeNet的设计思想
- GoogLeNet的核心是提出了Inception(开端)的概念与设计,并且逐步优化设计了4个版本。
- Inception V1
- Inception V2
- Inception V3
- Inception V4
Inception的核心思想
- Inception的核心思想是堆叠
- 通过设计一个稀疏网络结构,但是能够产生稠密的数据;
- 既能增加神经网络表现,又能保证计算资源的使用效率。
-
Inception使用3个卷积 + 1个池化运算堆叠而成
- 卷积:;
- 池化:最大池化;
- 每个卷积都使用ReLU函数激活;
- 输出后叠加,会增加你输出的通道(深度)
-
Inception设计的分析
- 增加了网络的宽度:
- 3个卷积运算构成的稀疏网络能够提取输入的每一个细节信息,增加提供网络;
- 降低过度拟合:
- 池化运算减少空间大小,降低过度拟合(有点类似Dropout,只是不是随机的)
- 增加了网络对尺度的适应性:
- 三个不同大小的卷积可以增加网络对尺度的适应性;
- 增加了网络的宽度:
Inception V1
Inception V1设计
- V1在上面Inception基础上,做了运算优化, 增加了一个的卷积运算。
- 理解堆叠的概念
-
卷积运算的分析
- 可以提供一个ReLU运算,增加非线性性。
- 因为上层叠加,所以每层的深度或者通道数会越来越大,可以增加用来减少深度或者通道,这样在做与的卷积运算时,可以降低运算量。
-
注意:
- 注意理解增加一个卷积为什么会减少运算量?
- 假设输入:,假设卷积核为: ,步长为1。
- 不增加卷积:
- 权重参数:;
- 增加卷积(层数假设为32):
- 权重参数:;
- 说明:
- 增加卷积后计算量是原来的;
- 注意理解增加一个卷积为什么会减少运算量?
GoogLeNet 22层网络结构
Torch的GoogLeNet网络实现
- Torch实现了GoogLeNet网络
- 从源代码与输出的结构看,官方的实现应该是Inception V1, 增加了BatchNorm2d,算是增强V1
from torchvision.models import GoogLeNet
GoogLeNet?
�[1;31mInit signature:�[0m
�[0mGoogLeNet�[0m�[1;33m(�[0m�[1;33m
�[0m �[0mnum_classes�[0m�[1;33m=�[0m�[1;36m1000�[0m�[1;33m,�[0m�[1;33m
�[0m �[0maux_logits�[0m�[1;33m=�[0m�[1;32mTrue�[0m�[1;33m,�[0m�[1;33m
�[0m �[0mtransform_input�[0m�[1;33m=�[0m�[1;32mFalse�[0m�[1;33m,�[0m�[1;33m
�[0m �[0minit_weights�[0m�[1;33m=�[0m�[1;32mTrue�[0m�[1;33m,�[0m�[1;33m
�[0m �[0mblocks�[0m�[1;33m=�[0m�[1;32mNone�[0m�[1;33m,�[0m�[1;33m
�[0m�[1;33m)�[0m�[1;33m�[0m�[0m
�[1;31mDocstring:�[0m
Base class for all neural network modules.
Your models should also subclass this class.
Modules can also contain other Modules, allowing to nest them in
a tree structure. You can assign the submodules as regular attributes::
import torch.nn as nn
import torch.nn.functional as F
class Model(nn.Module):
def __init__(self):
super(Model, self).__init__()
self.conv1 = nn.Conv2d(1, 20, 5)
self.conv2 = nn.Conv2d(20, 20, 5)
def forward(self, x):
x = F.relu(self.conv1(x))
return F.relu(self.conv2(x))
Submodules assigned in this way will be registered, and will have their
parameters converted too when you call :meth:`to`, etc.
�[1;31mInit docstring:�[0m Initializes internal Module state, shared by both nn.Module and ScriptModule.
�[1;31mFile:�[0m c:\program files\python36\lib\site-packages\torchvision\models\googlenet.py
�[1;31mType:�[0m type
�[1;31mSubclasses:�[0m QuantizableGoogLeNet
aux_logits用来控制中间的两个分类输出
-
GoogLeNet网络的输入仍然是
from torchvision.models import GoogLeNet
from torchsummary import summary
net = GoogLeNet()
# print(net)
# print("=========================================================")
# 输出网络结构
print(summary(net,input_size=(3, 224, 224), device='cpu'))
# print("=========================================================")
# # 输出网络结构
# print(summary(net.cuda(),input_size=(3, 224, 224)))
----------------------------------------------------------------
Layer (type) Output Shape Param #
================================================================
Conv2d-1 [-1, 64, 112, 112] 9,408
BatchNorm2d-2 [-1, 64, 112, 112] 128
BasicConv2d-3 [-1, 64, 112, 112] 0
MaxPool2d-4 [-1, 64, 56, 56] 0
Conv2d-5 [-1, 64, 56, 56] 4,096
BatchNorm2d-6 [-1, 64, 56, 56] 128
BasicConv2d-7 [-1, 64, 56, 56] 0
Conv2d-8 [-1, 192, 56, 56] 110,592
BatchNorm2d-9 [-1, 192, 56, 56] 384
BasicConv2d-10 [-1, 192, 56, 56] 0
MaxPool2d-11 [-1, 192, 28, 28] 0
Conv2d-12 [-1, 64, 28, 28] 12,288
BatchNorm2d-13 [-1, 64, 28, 28] 128
BasicConv2d-14 [-1, 64, 28, 28] 0
Conv2d-15 [-1, 96, 28, 28] 18,432
BatchNorm2d-16 [-1, 96, 28, 28] 192
BasicConv2d-17 [-1, 96, 28, 28] 0
Conv2d-18 [-1, 128, 28, 28] 110,592
BatchNorm2d-19 [-1, 128, 28, 28] 256
BasicConv2d-20 [-1, 128, 28, 28] 0
Conv2d-21 [-1, 16, 28, 28] 3,072
BatchNorm2d-22 [-1, 16, 28, 28] 32
BasicConv2d-23 [-1, 16, 28, 28] 0
Conv2d-24 [-1, 32, 28, 28] 4,608
BatchNorm2d-25 [-1, 32, 28, 28] 64
BasicConv2d-26 [-1, 32, 28, 28] 0
MaxPool2d-27 [-1, 192, 28, 28] 0
Conv2d-28 [-1, 32, 28, 28] 6,144
BatchNorm2d-29 [-1, 32, 28, 28] 64
BasicConv2d-30 [-1, 32, 28, 28] 0
Inception-31 [-1, 256, 28, 28] 0
Conv2d-32 [-1, 128, 28, 28] 32,768
BatchNorm2d-33 [-1, 128, 28, 28] 256
BasicConv2d-34 [-1, 128, 28, 28] 0
Conv2d-35 [-1, 128, 28, 28] 32,768
BatchNorm2d-36 [-1, 128, 28, 28] 256
BasicConv2d-37 [-1, 128, 28, 28] 0
Conv2d-38 [-1, 192, 28, 28] 221,184
BatchNorm2d-39 [-1, 192, 28, 28] 384
BasicConv2d-40 [-1, 192, 28, 28] 0
Conv2d-41 [-1, 32, 28, 28] 8,192
BatchNorm2d-42 [-1, 32, 28, 28] 64
BasicConv2d-43 [-1, 32, 28, 28] 0
Conv2d-44 [-1, 96, 28, 28] 27,648
BatchNorm2d-45 [-1, 96, 28, 28] 192
BasicConv2d-46 [-1, 96, 28, 28] 0
MaxPool2d-47 [-1, 256, 28, 28] 0
Conv2d-48 [-1, 64, 28, 28] 16,384
BatchNorm2d-49 [-1, 64, 28, 28] 128
BasicConv2d-50 [-1, 64, 28, 28] 0
Inception-51 [-1, 480, 28, 28] 0
MaxPool2d-52 [-1, 480, 14, 14] 0
Conv2d-53 [-1, 192, 14, 14] 92,160
BatchNorm2d-54 [-1, 192, 14, 14] 384
BasicConv2d-55 [-1, 192, 14, 14] 0
Conv2d-56 [-1, 96, 14, 14] 46,080
BatchNorm2d-57 [-1, 96, 14, 14] 192
BasicConv2d-58 [-1, 96, 14, 14] 0
Conv2d-59 [-1, 208, 14, 14] 179,712
BatchNorm2d-60 [-1, 208, 14, 14] 416
BasicConv2d-61 [-1, 208, 14, 14] 0
Conv2d-62 [-1, 16, 14, 14] 7,680
BatchNorm2d-63 [-1, 16, 14, 14] 32
BasicConv2d-64 [-1, 16, 14, 14] 0
Conv2d-65 [-1, 48, 14, 14] 6,912
BatchNorm2d-66 [-1, 48, 14, 14] 96
BasicConv2d-67 [-1, 48, 14, 14] 0
MaxPool2d-68 [-1, 480, 14, 14] 0
Conv2d-69 [-1, 64, 14, 14] 30,720
BatchNorm2d-70 [-1, 64, 14, 14] 128
BasicConv2d-71 [-1, 64, 14, 14] 0
Inception-72 [-1, 512, 14, 14] 0
Conv2d-73 [-1, 128, 4, 4] 65,536
BatchNorm2d-74 [-1, 128, 4, 4] 256
BasicConv2d-75 [-1, 128, 4, 4] 0
Linear-76 [-1, 1024] 2,098,176
Linear-77 [-1, 1000] 1,025,000
InceptionAux-78 [-1, 1000] 0
Conv2d-79 [-1, 160, 14, 14] 81,920
BatchNorm2d-80 [-1, 160, 14, 14] 320
BasicConv2d-81 [-1, 160, 14, 14] 0
Conv2d-82 [-1, 112, 14, 14] 57,344
BatchNorm2d-83 [-1, 112, 14, 14] 224
BasicConv2d-84 [-1, 112, 14, 14] 0
Conv2d-85 [-1, 224, 14, 14] 225,792
BatchNorm2d-86 [-1, 224, 14, 14] 448
BasicConv2d-87 [-1, 224, 14, 14] 0
Conv2d-88 [-1, 24, 14, 14] 12,288
BatchNorm2d-89 [-1, 24, 14, 14] 48
BasicConv2d-90 [-1, 24, 14, 14] 0
Conv2d-91 [-1, 64, 14, 14] 13,824
BatchNorm2d-92 [-1, 64, 14, 14] 128
BasicConv2d-93 [-1, 64, 14, 14] 0
MaxPool2d-94 [-1, 512, 14, 14] 0
Conv2d-95 [-1, 64, 14, 14] 32,768
BatchNorm2d-96 [-1, 64, 14, 14] 128
BasicConv2d-97 [-1, 64, 14, 14] 0
Inception-98 [-1, 512, 14, 14] 0
Conv2d-99 [-1, 128, 14, 14] 65,536
BatchNorm2d-100 [-1, 128, 14, 14] 256
BasicConv2d-101 [-1, 128, 14, 14] 0
Conv2d-102 [-1, 128, 14, 14] 65,536
BatchNorm2d-103 [-1, 128, 14, 14] 256
BasicConv2d-104 [-1, 128, 14, 14] 0
Conv2d-105 [-1, 256, 14, 14] 294,912
BatchNorm2d-106 [-1, 256, 14, 14] 512
BasicConv2d-107 [-1, 256, 14, 14] 0
Conv2d-108 [-1, 24, 14, 14] 12,288
BatchNorm2d-109 [-1, 24, 14, 14] 48
BasicConv2d-110 [-1, 24, 14, 14] 0
Conv2d-111 [-1, 64, 14, 14] 13,824
BatchNorm2d-112 [-1, 64, 14, 14] 128
BasicConv2d-113 [-1, 64, 14, 14] 0
MaxPool2d-114 [-1, 512, 14, 14] 0
Conv2d-115 [-1, 64, 14, 14] 32,768
BatchNorm2d-116 [-1, 64, 14, 14] 128
BasicConv2d-117 [-1, 64, 14, 14] 0
Inception-118 [-1, 512, 14, 14] 0
Conv2d-119 [-1, 112, 14, 14] 57,344
BatchNorm2d-120 [-1, 112, 14, 14] 224
BasicConv2d-121 [-1, 112, 14, 14] 0
Conv2d-122 [-1, 144, 14, 14] 73,728
BatchNorm2d-123 [-1, 144, 14, 14] 288
BasicConv2d-124 [-1, 144, 14, 14] 0
Conv2d-125 [-1, 288, 14, 14] 373,248
BatchNorm2d-126 [-1, 288, 14, 14] 576
BasicConv2d-127 [-1, 288, 14, 14] 0
Conv2d-128 [-1, 32, 14, 14] 16,384
BatchNorm2d-129 [-1, 32, 14, 14] 64
BasicConv2d-130 [-1, 32, 14, 14] 0
Conv2d-131 [-1, 64, 14, 14] 18,432
BatchNorm2d-132 [-1, 64, 14, 14] 128
BasicConv2d-133 [-1, 64, 14, 14] 0
MaxPool2d-134 [-1, 512, 14, 14] 0
Conv2d-135 [-1, 64, 14, 14] 32,768
BatchNorm2d-136 [-1, 64, 14, 14] 128
BasicConv2d-137 [-1, 64, 14, 14] 0
Inception-138 [-1, 528, 14, 14] 0
Conv2d-139 [-1, 128, 4, 4] 67,584
BatchNorm2d-140 [-1, 128, 4, 4] 256
BasicConv2d-141 [-1, 128, 4, 4] 0
Linear-142 [-1, 1024] 2,098,176
Linear-143 [-1, 1000] 1,025,000
InceptionAux-144 [-1, 1000] 0
Conv2d-145 [-1, 256, 14, 14] 135,168
BatchNorm2d-146 [-1, 256, 14, 14] 512
BasicConv2d-147 [-1, 256, 14, 14] 0
Conv2d-148 [-1, 160, 14, 14] 84,480
BatchNorm2d-149 [-1, 160, 14, 14] 320
BasicConv2d-150 [-1, 160, 14, 14] 0
Conv2d-151 [-1, 320, 14, 14] 460,800
BatchNorm2d-152 [-1, 320, 14, 14] 640
BasicConv2d-153 [-1, 320, 14, 14] 0
Conv2d-154 [-1, 32, 14, 14] 16,896
BatchNorm2d-155 [-1, 32, 14, 14] 64
BasicConv2d-156 [-1, 32, 14, 14] 0
Conv2d-157 [-1, 128, 14, 14] 36,864
BatchNorm2d-158 [-1, 128, 14, 14] 256
BasicConv2d-159 [-1, 128, 14, 14] 0
MaxPool2d-160 [-1, 528, 14, 14] 0
Conv2d-161 [-1, 128, 14, 14] 67,584
BatchNorm2d-162 [-1, 128, 14, 14] 256
BasicConv2d-163 [-1, 128, 14, 14] 0
Inception-164 [-1, 832, 14, 14] 0
MaxPool2d-165 [-1, 832, 7, 7] 0
Conv2d-166 [-1, 256, 7, 7] 212,992
BatchNorm2d-167 [-1, 256, 7, 7] 512
BasicConv2d-168 [-1, 256, 7, 7] 0
Conv2d-169 [-1, 160, 7, 7] 133,120
BatchNorm2d-170 [-1, 160, 7, 7] 320
BasicConv2d-171 [-1, 160, 7, 7] 0
Conv2d-172 [-1, 320, 7, 7] 460,800
BatchNorm2d-173 [-1, 320, 7, 7] 640
BasicConv2d-174 [-1, 320, 7, 7] 0
Conv2d-175 [-1, 32, 7, 7] 26,624
BatchNorm2d-176 [-1, 32, 7, 7] 64
BasicConv2d-177 [-1, 32, 7, 7] 0
Conv2d-178 [-1, 128, 7, 7] 36,864
BatchNorm2d-179 [-1, 128, 7, 7] 256
BasicConv2d-180 [-1, 128, 7, 7] 0
MaxPool2d-181 [-1, 832, 7, 7] 0
Conv2d-182 [-1, 128, 7, 7] 106,496
BatchNorm2d-183 [-1, 128, 7, 7] 256
BasicConv2d-184 [-1, 128, 7, 7] 0
Inception-185 [-1, 832, 7, 7] 0
Conv2d-186 [-1, 384, 7, 7] 319,488
BatchNorm2d-187 [-1, 384, 7, 7] 768
BasicConv2d-188 [-1, 384, 7, 7] 0
Conv2d-189 [-1, 192, 7, 7] 159,744
BatchNorm2d-190 [-1, 192, 7, 7] 384
BasicConv2d-191 [-1, 192, 7, 7] 0
Conv2d-192 [-1, 384, 7, 7] 663,552
BatchNorm2d-193 [-1, 384, 7, 7] 768
BasicConv2d-194 [-1, 384, 7, 7] 0
Conv2d-195 [-1, 48, 7, 7] 39,936
BatchNorm2d-196 [-1, 48, 7, 7] 96
BasicConv2d-197 [-1, 48, 7, 7] 0
Conv2d-198 [-1, 128, 7, 7] 55,296
BatchNorm2d-199 [-1, 128, 7, 7] 256
BasicConv2d-200 [-1, 128, 7, 7] 0
MaxPool2d-201 [-1, 832, 7, 7] 0
Conv2d-202 [-1, 128, 7, 7] 106,496
BatchNorm2d-203 [-1, 128, 7, 7] 256
BasicConv2d-204 [-1, 128, 7, 7] 0
Inception-205 [-1, 1024, 7, 7] 0
AdaptiveAvgPool2d-206 [-1, 1024, 1, 1] 0
Dropout-207 [-1, 1024] 0
Linear-208 [-1, 1000] 1,025,000
================================================================
Total params: 13,004,888
Trainable params: 13,004,888
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.57
Forward/backward pass size (MB): 94.25
Params size (MB): 49.61
Estimated Total Size (MB): 144.43
----------------------------------------------------------------
None
手工实现GoogLeNet网络结构
GoogLeNet施工参数细节
- 下面略掉了ReLU与BatchNorm2d,这两个操作对数据大小与格式没有影响。
-
输入图像
- 要求图像去中心化处理(均值化为0处理)
- 格式:
-
卷积层-1
- 输入:
- 卷积核:
- 输出:
- 最大池化核:
- 输出:
-
卷积层-2
- 输入:
- 卷积核:
- 输出:
- 卷积核:
- 输出:
- 最大池化核:
- 输出:
-
Inception(3a)
- 输入:
- 分支1
- 卷积核:
- 输出:
- 分支2
- 卷积核:
- 卷积核:
- 输出:
- 分支3
- 卷积核:
- 卷积核:
- 输出:
- 分支4
- 池化核:
- 卷积核:
- 输出:
- 输出:
-
Inception(3b)
- 输入:
- 分支1
- 卷积核:
- 输出:
- 分支2
- 卷积核:
- 卷积核:
- 输出:
- 分支3
- 卷积核:
- 卷积核:
- 输出:
- 分支4
- 池化核:
- 卷积核:
- 输出:
- 输出:
- 最大池化核:
- 输出:
-
Inception(4a)
- 输入:
- 分支1
- 卷积核:
- 输出:
- 分支2
- 卷积核:
- 卷积核:
- 输出:
- 分支3
- 卷积核:
- 卷积核:
- 输出:
- 分支4
- 池化核:
- 卷积核:
- 输出:
- 输出:
-
辅助分类器-1
- 输入:
- AdaptiveAvgPool2d:
- 输出:
- 卷积核:
- 输出:
- 全连接:
- 全连接:,是分类的类别;
- 输出:
-
Inception(4b)
- 输入:
- 分支1
- 卷积核:
- 输出:
- 分支2
- 卷积核:
- 卷积核:
- 输出:
- 分支3
- 卷积核:
- 卷积核:
- 输出:
- 分支4
- 池化核:
- 卷积核:
- 输出:
- 输出:
-
Inception(4c)
- 输入:
- 分支1
- 卷积核:
- 输出:
- 分支2
- 卷积核:
- 卷积核:
- 输出:
- 分支3
- 卷积核:
- 卷积核:
- 输出:
- 分支4
- 池化核:
- 卷积核:
- 输出:
- 输出:
-
Inception(4d)
- 输入:
- 分支1
- 卷积核:
- 输出:
- 分支2
- 卷积核:
- 卷积核:
- 输出:
- 分支3
- 卷积核:
- 卷积核:
- 输出:
- 分支4
- 池化核:
- 卷积核:
- 输出:
- 输出:
-
辅助分类器-2
- 输入:
- AdaptiveAvgPool2d:
- 输出:
- 卷积核:
- 输出:
- 全连接:
- 全连接:,是分类的类别;
- 输出:
-
Inception(4e)
- 输入:
- 分支1
- 卷积核:
- 输出:
- 分支2
- 卷积核:
- 卷积核:
- 输出:
- 分支3
- 卷积核:
- 卷积核:
- 输出:
- 分支4
- 池化核:
- 卷积核:
- 输出:
- 输出:
- 最大池化核:
- 输出:
-
Inception(5a)
- 输入:
- 分支1
- 卷积核:
- 输出:
- 分支2
- 卷积核:
- 卷积核:
- 输出:
- 分支3
- 卷积核:
- 卷积核:
- 输出:
- 分支4
- 池化核:
- 卷积核:
- 输出:
- 输出:
-
Inception(5b)
- 输入:
- 分支1
- 卷积核:
- 输出:
- 分支2
- 卷积核:
- 卷积核:
- 输出:
- 分支3
- 卷积核:
- 卷积核:
- 输出:
- 分支4
- 池化核:
- 卷积核:
- 输出:
- 输出:
- AdaptiveAvgPool2d:
- 输出:
-
全连接层
-
- 表示类别数
-
基本的卷积封装
- Inception V1的增强核心是BatchNorm2d,每个卷积运算都会包含如下两个运算,所以做一个封装
- BatchNorm2d
- relu
import torch
from torch.nn import Conv2d, BatchNorm2d, Module, ReLU
class YQConv2d(Module):
# 构造器,初始化Conv2d, BatchNorm2d与ReLU
def __init__(self, in_channels, out_channels, kernel_size=1, stride=1, padding=0):
super(YQConv2d, self).__init__()
# 卷积层
self.conv = Conv2d(in_channels, out_channels, kernel_size, stride, padding, bias=False)
# BatchNorm2d
self.bn = BatchNorm2d(out_channels, eps=0.001)
# 激活
self.relu = ReLU(inplace=True)
def forward(self, x):
y_ = self.conv(x)
y_ = self.bn(y_)
y_ = self.relu(y_)
return y_
- 卷积的封装的结构
from torchsummary import summary
conv = YQConv2d(3, 64)
# 输出网络结构
print(summary(conv,input_size=(3, 224, 224), device='cpu'))
----------------------------------------------------------------
Layer (type) Output Shape Param #
================================================================
Conv2d-1 [-1, 64, 224, 224] 192
BatchNorm2d-2 [-1, 64, 224, 224] 128
ReLU-3 [-1, 64, 224, 224] 0
================================================================
Total params: 320
Trainable params: 320
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.57
Forward/backward pass size (MB): 73.50
Params size (MB): 0.00
Estimated Total Size (MB): 74.08
----------------------------------------------------------------
None
Inception封装
- 使用上面的卷积作为核心单元,封装Inception
import torch
from torch.nn import Module, Sequential, MaxPool2d
class YQInception(Module):
# 构造器设置4个分支的参数(输出的通道数)
def __init__(self, in_channels, ch1x1, ch3x3_1, ch3x3_2, ch5x5_1, ch5x5_2, ch_pool):
super(YQInception, self).__init__()
# YQConv2d(self, in_channels, out_channels, kernel_size=1, stride=1, padding=0):
# 分支-1
self.branch_1 = Sequential(
YQConv2d(in_channels=in_channels, out_channels=ch1x1, kernel_size=1, stride=1, padding=0)
)
# 分支-2
self.branch_2 = Sequential(
YQConv2d(in_channels=in_channels, out_channels=ch3x3_1, kernel_size=1, stride=1, padding=0),
YQConv2d(in_channels=ch3x3_1, out_channels=ch3x3_2, kernel_size=3, stride=1, padding=1)
)
# 分支-3
self.branch_3 = Sequential(
YQConv2d(in_channels=in_channels, out_channels=ch5x5_1, kernel_size=1, stride=1, padding=0),
YQConv2d(in_channels=ch5x5_1, out_channels=ch5x5_2, kernel_size=3, stride=1, padding=1)
)
# 分支-4
self.branch_4 = Sequential(
MaxPool2d(kernel_size=3, stride=1, padding=1, ceil_mode=True),
YQConv2d(in_channels=in_channels, out_channels=ch_pool, kernel_size=1, stride=1, padding=0)
)
def forward(self, x):
b_y1 = self.branch_1(x)
b_y2 = self.branch_2(x)
b_y3 = self.branch_3(x)
b_y4 = self.branch_4(x)
y_ = torch.cat([b_y1, b_y2, b_y3, b_y4], 1) # 1 表示按照列链接,就是行堆叠在一起。
return y_
- 输出下网络结构
from torchsummary import summary
conv = YQInception(3, 64, 96, 128, 16, 32, 32) # 最后输出的深度是64 + 128 + 32 + 32
# 输出网络结构
print(summary(conv,input_size=(3, 224, 224), device='cpu'))
----------------------------------------------------------------
Layer (type) Output Shape Param #
================================================================
Conv2d-1 [-1, 64, 224, 224] 192
BatchNorm2d-2 [-1, 64, 224, 224] 128
ReLU-3 [-1, 64, 224, 224] 0
YQConv2d-4 [-1, 64, 224, 224] 0
Conv2d-5 [-1, 96, 224, 224] 288
BatchNorm2d-6 [-1, 96, 224, 224] 192
ReLU-7 [-1, 96, 224, 224] 0
YQConv2d-8 [-1, 96, 224, 224] 0
Conv2d-9 [-1, 128, 224, 224] 110,592
BatchNorm2d-10 [-1, 128, 224, 224] 256
ReLU-11 [-1, 128, 224, 224] 0
YQConv2d-12 [-1, 128, 224, 224] 0
Conv2d-13 [-1, 16, 224, 224] 48
BatchNorm2d-14 [-1, 16, 224, 224] 32
ReLU-15 [-1, 16, 224, 224] 0
YQConv2d-16 [-1, 16, 224, 224] 0
Conv2d-17 [-1, 32, 224, 224] 4,608
BatchNorm2d-18 [-1, 32, 224, 224] 64
ReLU-19 [-1, 32, 224, 224] 0
YQConv2d-20 [-1, 32, 224, 224] 0
MaxPool2d-21 [-1, 3, 224, 224] 0
Conv2d-22 [-1, 32, 224, 224] 96
BatchNorm2d-23 [-1, 32, 224, 224] 64
ReLU-24 [-1, 32, 224, 224] 0
YQConv2d-25 [-1, 32, 224, 224] 0
================================================================
Total params: 116,560
Trainable params: 116,560
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.57
Forward/backward pass size (MB): 564.65
Params size (MB): 0.44
Estimated Total Size (MB): 565.67
----------------------------------------------------------------
None
辅助(auxiliary)分类器封装
- 辅助分类器一共可训练层是3层.
import torch
from torch.nn import Module, AdaptiveAvgPool2d, Linear, ReLU, Dropout
class YQAuxClassifier(Module):
def __init__(self, in_channels, num_classes):
super(YQAuxClassifier, self).__init__()
# 池化/卷积/全连接/全连接
self.pool = AdaptiveAvgPool2d((4, 4))
self.conv = YQConv2d(in_channels=in_channels, out_channels=128, kernel_size=1, stride=1, padding=0)
self.fc_1 = Linear(2048, 1024)
self.relu = ReLU(inplace=True)
self.drop = Dropout(p=0.7, inplace=False)
self.fc_2 = Linear(1024, num_classes)
def forward(self, x):
y_ = self.pool(x)
y_ = self.conv(y_)
y_ = torch.flatten(y_, 1) # 卷积到连接层的数据维度转换。
y_ = self.fc_1(y_)
y_ = self.relu(y_)
y_ = self.drop(y_)
y_ = self.fc_2(y_)
return y_
- 辅助分类器可视化
from torchsummary import summary
classifier = YQAuxClassifier(512, 10) # 最后输出的深度是64 + 128 + 32 + 32
# 输出网络结构
print(summary(classifier, input_size=(512, 14, 14), device='cpu'))
----------------------------------------------------------------
Layer (type) Output Shape Param #
================================================================
AdaptiveAvgPool2d-1 [-1, 512, 4, 4] 0
Conv2d-2 [-1, 128, 4, 4] 65,536
BatchNorm2d-3 [-1, 128, 4, 4] 256
ReLU-4 [-1, 128, 4, 4] 0
YQConv2d-5 [-1, 128, 4, 4] 0
Linear-6 [-1, 1024] 2,098,176
ReLU-7 [-1, 1024] 0
Dropout-8 [-1, 1024] 0
Linear-9 [-1, 10] 10,250
================================================================
Total params: 2,174,218
Trainable params: 2,174,218
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.38
Forward/backward pass size (MB): 0.15
Params size (MB): 8.29
Estimated Total Size (MB): 8.83
----------------------------------------------------------------
None
GoogLeNet实现
- 按照下面几个阶段实现:
- 前面几个卷积
- Inception(包含两个辅助分类器)
- 分类器
import torch
from torch.nn import Module, AdaptiveAvgPool2d, Linear, Dropout, MaxPool2d
class YQGoogLeNet(Module):
def __init__(self, num_classes=1000):
super(YQGoogLeNet, self).__init__()
# 定义层
self.conv_1 = YQConv2d(in_channels=3, out_channels=64, kernel_size=7, stride=2, padding=3)
self.pool_1 = MaxPool2d(3, stride=2, ceil_mode=True)
self.conv_2 = YQConv2d(in_channels=64, out_channels=64, kernel_size=1, stride=1, padding=0)
self.conv_3 = YQConv2d(in_channels=64, out_channels=192, kernel_size=3, stride=1, padding=1)
self.pool_2 = MaxPool2d(3, stride=2, ceil_mode=True)
self.ince_3a = YQInception(192, 64, 96, 128, 16, 32, 32)
self.ince_3b = YQInception(256, 128, 128, 192, 32, 96, 64)
self.pool_3 = MaxPool2d(3, stride=2, ceil_mode=True)
self.ince_4a = YQInception(480, 192, 96, 208, 16, 48, 64)
self.ince_4b = YQInception(512, 160, 112, 224, 24, 64, 64)
self.ince_4c = YQInception(512, 128, 128, 256, 24, 64, 64)
self.ince_4d = YQInception(512, 112, 144, 288, 32, 64, 64)
self.ince_4e = YQInception(528, 256, 160, 320, 32, 128, 128)
self.pool_4 = MaxPool2d(2, stride=2, ceil_mode=True)
self.ince_5a = YQInception(832, 256, 160, 320, 32, 128, 128)
self.ince_5b = YQInception(832, 384, 192, 384, 48, 128, 128)
# 两个辅助分类器
self.auxi_1 = YQAuxClassifier(512, num_classes)
self.auxi_2 = YQAuxClassifier(528, num_classes)
# 结尾的分类层
self.pool_5 = AdaptiveAvgPool2d((1, 1))
self.drop = Dropout(0.2)
self.full = Linear(1024, num_classes)
def forward(self, x):
# -----------------------------
y_ = self.conv_1(x)
y_ = self.pool_1(y_)
# -----------------------------
y_ = self.conv_2(y_)
y_ = self.conv_3(y_)
y_ = self.pool_2(y_)
# -----------------------------
y_ = self.ince_3a(y_)
y_ = self.ince_3b(y_)
y_ = self.pool_3(y_)
# -----------------------------
y_ = self.ince_4a(y_)
# -----------------------------
a1 = self.auxi_1(y_)
# -----------------------------
y_ = self.ince_4b(y_)
y_ = self.ince_4c(y_)
y_ = self.ince_4d(y_)
# -----------------------------
a2 = self.auxi_2(y_)
# -----------------------------
y_ = self.ince_4e(y_)
y_ = self.pool_4(y_)
# -----------------------------
y_ = self.ince_5a(y_)
y_ = self.ince_5b(y_)
# -----------------------------
y_ = self.pool_5(y_)
y_ = torch.flatten(y_, 1)
y_ = self.drop(y_)
y_ = self.full(y_)
return y_, a1, a2
- GoogLeNet网络结构
from torchsummary import summary
net = YQGoogLeNet(1000)
# 输出网络结构
print(summary(net,input_size=(3, 244, 244), device='cpu'))
----------------------------------------------------------------
Layer (type) Output Shape Param #
================================================================
Conv2d-1 [-1, 64, 122, 122] 9,408
BatchNorm2d-2 [-1, 64, 122, 122] 128
ReLU-3 [-1, 64, 122, 122] 0
YQConv2d-4 [-1, 64, 122, 122] 0
MaxPool2d-5 [-1, 64, 61, 61] 0
Conv2d-6 [-1, 64, 61, 61] 4,096
BatchNorm2d-7 [-1, 64, 61, 61] 128
ReLU-8 [-1, 64, 61, 61] 0
YQConv2d-9 [-1, 64, 61, 61] 0
Conv2d-10 [-1, 192, 61, 61] 110,592
BatchNorm2d-11 [-1, 192, 61, 61] 384
ReLU-12 [-1, 192, 61, 61] 0
YQConv2d-13 [-1, 192, 61, 61] 0
MaxPool2d-14 [-1, 192, 30, 30] 0
Conv2d-15 [-1, 64, 30, 30] 12,288
BatchNorm2d-16 [-1, 64, 30, 30] 128
ReLU-17 [-1, 64, 30, 30] 0
YQConv2d-18 [-1, 64, 30, 30] 0
Conv2d-19 [-1, 96, 30, 30] 18,432
BatchNorm2d-20 [-1, 96, 30, 30] 192
ReLU-21 [-1, 96, 30, 30] 0
YQConv2d-22 [-1, 96, 30, 30] 0
Conv2d-23 [-1, 128, 30, 30] 110,592
BatchNorm2d-24 [-1, 128, 30, 30] 256
ReLU-25 [-1, 128, 30, 30] 0
YQConv2d-26 [-1, 128, 30, 30] 0
Conv2d-27 [-1, 16, 30, 30] 3,072
BatchNorm2d-28 [-1, 16, 30, 30] 32
ReLU-29 [-1, 16, 30, 30] 0
YQConv2d-30 [-1, 16, 30, 30] 0
Conv2d-31 [-1, 32, 30, 30] 4,608
BatchNorm2d-32 [-1, 32, 30, 30] 64
ReLU-33 [-1, 32, 30, 30] 0
YQConv2d-34 [-1, 32, 30, 30] 0
MaxPool2d-35 [-1, 192, 30, 30] 0
Conv2d-36 [-1, 32, 30, 30] 6,144
BatchNorm2d-37 [-1, 32, 30, 30] 64
ReLU-38 [-1, 32, 30, 30] 0
YQConv2d-39 [-1, 32, 30, 30] 0
YQInception-40 [-1, 256, 30, 30] 0
Conv2d-41 [-1, 128, 30, 30] 32,768
BatchNorm2d-42 [-1, 128, 30, 30] 256
ReLU-43 [-1, 128, 30, 30] 0
YQConv2d-44 [-1, 128, 30, 30] 0
Conv2d-45 [-1, 128, 30, 30] 32,768
BatchNorm2d-46 [-1, 128, 30, 30] 256
ReLU-47 [-1, 128, 30, 30] 0
YQConv2d-48 [-1, 128, 30, 30] 0
Conv2d-49 [-1, 192, 30, 30] 221,184
BatchNorm2d-50 [-1, 192, 30, 30] 384
ReLU-51 [-1, 192, 30, 30] 0
YQConv2d-52 [-1, 192, 30, 30] 0
Conv2d-53 [-1, 32, 30, 30] 8,192
BatchNorm2d-54 [-1, 32, 30, 30] 64
ReLU-55 [-1, 32, 30, 30] 0
YQConv2d-56 [-1, 32, 30, 30] 0
Conv2d-57 [-1, 96, 30, 30] 27,648
BatchNorm2d-58 [-1, 96, 30, 30] 192
ReLU-59 [-1, 96, 30, 30] 0
YQConv2d-60 [-1, 96, 30, 30] 0
MaxPool2d-61 [-1, 256, 30, 30] 0
Conv2d-62 [-1, 64, 30, 30] 16,384
BatchNorm2d-63 [-1, 64, 30, 30] 128
ReLU-64 [-1, 64, 30, 30] 0
YQConv2d-65 [-1, 64, 30, 30] 0
YQInception-66 [-1, 480, 30, 30] 0
MaxPool2d-67 [-1, 480, 15, 15] 0
Conv2d-68 [-1, 192, 15, 15] 92,160
BatchNorm2d-69 [-1, 192, 15, 15] 384
ReLU-70 [-1, 192, 15, 15] 0
YQConv2d-71 [-1, 192, 15, 15] 0
Conv2d-72 [-1, 96, 15, 15] 46,080
BatchNorm2d-73 [-1, 96, 15, 15] 192
ReLU-74 [-1, 96, 15, 15] 0
YQConv2d-75 [-1, 96, 15, 15] 0
Conv2d-76 [-1, 208, 15, 15] 179,712
BatchNorm2d-77 [-1, 208, 15, 15] 416
ReLU-78 [-1, 208, 15, 15] 0
YQConv2d-79 [-1, 208, 15, 15] 0
Conv2d-80 [-1, 16, 15, 15] 7,680
BatchNorm2d-81 [-1, 16, 15, 15] 32
ReLU-82 [-1, 16, 15, 15] 0
YQConv2d-83 [-1, 16, 15, 15] 0
Conv2d-84 [-1, 48, 15, 15] 6,912
BatchNorm2d-85 [-1, 48, 15, 15] 96
ReLU-86 [-1, 48, 15, 15] 0
YQConv2d-87 [-1, 48, 15, 15] 0
MaxPool2d-88 [-1, 480, 15, 15] 0
Conv2d-89 [-1, 64, 15, 15] 30,720
BatchNorm2d-90 [-1, 64, 15, 15] 128
ReLU-91 [-1, 64, 15, 15] 0
YQConv2d-92 [-1, 64, 15, 15] 0
YQInception-93 [-1, 512, 15, 15] 0
AdaptiveAvgPool2d-94 [-1, 512, 4, 4] 0
Conv2d-95 [-1, 128, 4, 4] 65,536
BatchNorm2d-96 [-1, 128, 4, 4] 256
ReLU-97 [-1, 128, 4, 4] 0
YQConv2d-98 [-1, 128, 4, 4] 0
Linear-99 [-1, 1024] 2,098,176
ReLU-100 [-1, 1024] 0
Dropout-101 [-1, 1024] 0
Linear-102 [-1, 1000] 1,025,000
YQAuxClassifier-103 [-1, 1000] 0
Conv2d-104 [-1, 160, 15, 15] 81,920
BatchNorm2d-105 [-1, 160, 15, 15] 320
ReLU-106 [-1, 160, 15, 15] 0
YQConv2d-107 [-1, 160, 15, 15] 0
Conv2d-108 [-1, 112, 15, 15] 57,344
BatchNorm2d-109 [-1, 112, 15, 15] 224
ReLU-110 [-1, 112, 15, 15] 0
YQConv2d-111 [-1, 112, 15, 15] 0
Conv2d-112 [-1, 224, 15, 15] 225,792
BatchNorm2d-113 [-1, 224, 15, 15] 448
ReLU-114 [-1, 224, 15, 15] 0
YQConv2d-115 [-1, 224, 15, 15] 0
Conv2d-116 [-1, 24, 15, 15] 12,288
BatchNorm2d-117 [-1, 24, 15, 15] 48
ReLU-118 [-1, 24, 15, 15] 0
YQConv2d-119 [-1, 24, 15, 15] 0
Conv2d-120 [-1, 64, 15, 15] 13,824
BatchNorm2d-121 [-1, 64, 15, 15] 128
ReLU-122 [-1, 64, 15, 15] 0
YQConv2d-123 [-1, 64, 15, 15] 0
MaxPool2d-124 [-1, 512, 15, 15] 0
Conv2d-125 [-1, 64, 15, 15] 32,768
BatchNorm2d-126 [-1, 64, 15, 15] 128
ReLU-127 [-1, 64, 15, 15] 0
YQConv2d-128 [-1, 64, 15, 15] 0
YQInception-129 [-1, 512, 15, 15] 0
Conv2d-130 [-1, 128, 15, 15] 65,536
BatchNorm2d-131 [-1, 128, 15, 15] 256
ReLU-132 [-1, 128, 15, 15] 0
YQConv2d-133 [-1, 128, 15, 15] 0
Conv2d-134 [-1, 128, 15, 15] 65,536
BatchNorm2d-135 [-1, 128, 15, 15] 256
ReLU-136 [-1, 128, 15, 15] 0
YQConv2d-137 [-1, 128, 15, 15] 0
Conv2d-138 [-1, 256, 15, 15] 294,912
BatchNorm2d-139 [-1, 256, 15, 15] 512
ReLU-140 [-1, 256, 15, 15] 0
YQConv2d-141 [-1, 256, 15, 15] 0
Conv2d-142 [-1, 24, 15, 15] 12,288
BatchNorm2d-143 [-1, 24, 15, 15] 48
ReLU-144 [-1, 24, 15, 15] 0
YQConv2d-145 [-1, 24, 15, 15] 0
Conv2d-146 [-1, 64, 15, 15] 13,824
BatchNorm2d-147 [-1, 64, 15, 15] 128
ReLU-148 [-1, 64, 15, 15] 0
YQConv2d-149 [-1, 64, 15, 15] 0
MaxPool2d-150 [-1, 512, 15, 15] 0
Conv2d-151 [-1, 64, 15, 15] 32,768
BatchNorm2d-152 [-1, 64, 15, 15] 128
ReLU-153 [-1, 64, 15, 15] 0
YQConv2d-154 [-1, 64, 15, 15] 0
YQInception-155 [-1, 512, 15, 15] 0
Conv2d-156 [-1, 112, 15, 15] 57,344
BatchNorm2d-157 [-1, 112, 15, 15] 224
ReLU-158 [-1, 112, 15, 15] 0
YQConv2d-159 [-1, 112, 15, 15] 0
Conv2d-160 [-1, 144, 15, 15] 73,728
BatchNorm2d-161 [-1, 144, 15, 15] 288
ReLU-162 [-1, 144, 15, 15] 0
YQConv2d-163 [-1, 144, 15, 15] 0
Conv2d-164 [-1, 288, 15, 15] 373,248
BatchNorm2d-165 [-1, 288, 15, 15] 576
ReLU-166 [-1, 288, 15, 15] 0
YQConv2d-167 [-1, 288, 15, 15] 0
Conv2d-168 [-1, 32, 15, 15] 16,384
BatchNorm2d-169 [-1, 32, 15, 15] 64
ReLU-170 [-1, 32, 15, 15] 0
YQConv2d-171 [-1, 32, 15, 15] 0
Conv2d-172 [-1, 64, 15, 15] 18,432
BatchNorm2d-173 [-1, 64, 15, 15] 128
ReLU-174 [-1, 64, 15, 15] 0
YQConv2d-175 [-1, 64, 15, 15] 0
MaxPool2d-176 [-1, 512, 15, 15] 0
Conv2d-177 [-1, 64, 15, 15] 32,768
BatchNorm2d-178 [-1, 64, 15, 15] 128
ReLU-179 [-1, 64, 15, 15] 0
YQConv2d-180 [-1, 64, 15, 15] 0
YQInception-181 [-1, 528, 15, 15] 0
AdaptiveAvgPool2d-182 [-1, 528, 4, 4] 0
Conv2d-183 [-1, 128, 4, 4] 67,584
BatchNorm2d-184 [-1, 128, 4, 4] 256
ReLU-185 [-1, 128, 4, 4] 0
YQConv2d-186 [-1, 128, 4, 4] 0
Linear-187 [-1, 1024] 2,098,176
ReLU-188 [-1, 1024] 0
Dropout-189 [-1, 1024] 0
Linear-190 [-1, 1000] 1,025,000
YQAuxClassifier-191 [-1, 1000] 0
Conv2d-192 [-1, 256, 15, 15] 135,168
BatchNorm2d-193 [-1, 256, 15, 15] 512
ReLU-194 [-1, 256, 15, 15] 0
YQConv2d-195 [-1, 256, 15, 15] 0
Conv2d-196 [-1, 160, 15, 15] 84,480
BatchNorm2d-197 [-1, 160, 15, 15] 320
ReLU-198 [-1, 160, 15, 15] 0
YQConv2d-199 [-1, 160, 15, 15] 0
Conv2d-200 [-1, 320, 15, 15] 460,800
BatchNorm2d-201 [-1, 320, 15, 15] 640
ReLU-202 [-1, 320, 15, 15] 0
YQConv2d-203 [-1, 320, 15, 15] 0
Conv2d-204 [-1, 32, 15, 15] 16,896
BatchNorm2d-205 [-1, 32, 15, 15] 64
ReLU-206 [-1, 32, 15, 15] 0
YQConv2d-207 [-1, 32, 15, 15] 0
Conv2d-208 [-1, 128, 15, 15] 36,864
BatchNorm2d-209 [-1, 128, 15, 15] 256
ReLU-210 [-1, 128, 15, 15] 0
YQConv2d-211 [-1, 128, 15, 15] 0
MaxPool2d-212 [-1, 528, 15, 15] 0
Conv2d-213 [-1, 128, 15, 15] 67,584
BatchNorm2d-214 [-1, 128, 15, 15] 256
ReLU-215 [-1, 128, 15, 15] 0
YQConv2d-216 [-1, 128, 15, 15] 0
YQInception-217 [-1, 832, 15, 15] 0
MaxPool2d-218 [-1, 832, 8, 8] 0
Conv2d-219 [-1, 256, 8, 8] 212,992
BatchNorm2d-220 [-1, 256, 8, 8] 512
ReLU-221 [-1, 256, 8, 8] 0
YQConv2d-222 [-1, 256, 8, 8] 0
Conv2d-223 [-1, 160, 8, 8] 133,120
BatchNorm2d-224 [-1, 160, 8, 8] 320
ReLU-225 [-1, 160, 8, 8] 0
YQConv2d-226 [-1, 160, 8, 8] 0
Conv2d-227 [-1, 320, 8, 8] 460,800
BatchNorm2d-228 [-1, 320, 8, 8] 640
ReLU-229 [-1, 320, 8, 8] 0
YQConv2d-230 [-1, 320, 8, 8] 0
Conv2d-231 [-1, 32, 8, 8] 26,624
BatchNorm2d-232 [-1, 32, 8, 8] 64
ReLU-233 [-1, 32, 8, 8] 0
YQConv2d-234 [-1, 32, 8, 8] 0
Conv2d-235 [-1, 128, 8, 8] 36,864
BatchNorm2d-236 [-1, 128, 8, 8] 256
ReLU-237 [-1, 128, 8, 8] 0
YQConv2d-238 [-1, 128, 8, 8] 0
MaxPool2d-239 [-1, 832, 8, 8] 0
Conv2d-240 [-1, 128, 8, 8] 106,496
BatchNorm2d-241 [-1, 128, 8, 8] 256
ReLU-242 [-1, 128, 8, 8] 0
YQConv2d-243 [-1, 128, 8, 8] 0
YQInception-244 [-1, 832, 8, 8] 0
Conv2d-245 [-1, 384, 8, 8] 319,488
BatchNorm2d-246 [-1, 384, 8, 8] 768
ReLU-247 [-1, 384, 8, 8] 0
YQConv2d-248 [-1, 384, 8, 8] 0
Conv2d-249 [-1, 192, 8, 8] 159,744
BatchNorm2d-250 [-1, 192, 8, 8] 384
ReLU-251 [-1, 192, 8, 8] 0
YQConv2d-252 [-1, 192, 8, 8] 0
Conv2d-253 [-1, 384, 8, 8] 663,552
BatchNorm2d-254 [-1, 384, 8, 8] 768
ReLU-255 [-1, 384, 8, 8] 0
YQConv2d-256 [-1, 384, 8, 8] 0
Conv2d-257 [-1, 48, 8, 8] 39,936
BatchNorm2d-258 [-1, 48, 8, 8] 96
ReLU-259 [-1, 48, 8, 8] 0
YQConv2d-260 [-1, 48, 8, 8] 0
Conv2d-261 [-1, 128, 8, 8] 55,296
BatchNorm2d-262 [-1, 128, 8, 8] 256
ReLU-263 [-1, 128, 8, 8] 0
YQConv2d-264 [-1, 128, 8, 8] 0
MaxPool2d-265 [-1, 832, 8, 8] 0
Conv2d-266 [-1, 128, 8, 8] 106,496
BatchNorm2d-267 [-1, 128, 8, 8] 256
ReLU-268 [-1, 128, 8, 8] 0
YQConv2d-269 [-1, 128, 8, 8] 0
YQInception-270 [-1, 1024, 8, 8] 0
AdaptiveAvgPool2d-271 [-1, 1024, 1, 1] 0
Dropout-272 [-1, 1024] 0
Linear-273 [-1, 1000] 1,025,000
================================================================
Total params: 13,004,888
Trainable params: 13,004,888
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.68
Forward/backward pass size (MB): 139.36
Params size (MB): 49.61
Estimated Total Size (MB): 189.65
----------------------------------------------------------------
None
我们使用1000个类别,输出的参数大小与官方的大小一致。
-
上面模型有个不足的地方就是:
- 辅助分类器只对训练有用,实际上对预测分类的时候是没有用处的,可以使用一个逻辑变量打开/关闭辅助分类器的计算。
数据集加载与训练
-
由于GoogLeNet有两个辅助分类器,鼓励在浅层分类,所以训练的方式与原来的单纯的分类训练方式有差别。
- 在训练过程中,辅助分类器的损失会根据折扣后的权重(折扣权重为0.3)叠加到总损失中。
为了代码清晰,下面代码都几种在一起。
import torch
from torch.nn import Conv2d, BatchNorm2d, Module, ReLU, Sequential, MaxPool2d, AdaptiveAvgPool2d, Linear, Dropout
from torchvision.datasets import ImageFolder
from torchvision.transforms import *
from torchvision.transforms.functional import *
from torch.utils.data import random_split
from torch.utils.data import DataLoader
import torch
import torchvision
import numpy as np
import cv2
# 卷积与BN,ReLU的合并封装,实际上使用Sequential也可以更加简洁的实现。
# --------------------------------------------------------------
class YQConv2d(Module):
# 构造器,初始化Conv2d, BatchNorm2d与ReLU
def __init__(self, in_channels, out_channels, kernel_size=1, stride=1, padding=0):
super(YQConv2d, self).__init__()
# 卷积层
self.conv = Conv2d(in_channels, out_channels, kernel_size, stride, padding, bias=False)
# BatchNorm2d
self.bn = BatchNorm2d(out_channels, eps=0.001)
# 激活
self.relu = ReLU(inplace=True)
def forward(self, x):
y_ = self.conv(x)
y_ = self.bn(y_)
y_ = self.relu(y_)
return y_
# Inception V2实现
# --------------------------------------------------------------
class YQInception(Module):
# 构造器设置4个分支的参数(输出的通道数)
def __init__(self, in_channels, ch1x1, ch3x3_1, ch3x3_2, ch5x5_1, ch5x5_2, ch_pool):
super(YQInception, self).__init__()
# YQConv2d(self, in_channels, out_channels, kernel_size=1, stride=1, padding=0):
# 分支-1
self.branch_1 = Sequential(
YQConv2d(in_channels=in_channels, out_channels=ch1x1, kernel_size=1, stride=1, padding=0)
)
# 分支-2
self.branch_2 = Sequential(
YQConv2d(in_channels=in_channels, out_channels=ch3x3_1, kernel_size=1, stride=1, padding=0),
YQConv2d(in_channels=ch3x3_1, out_channels=ch3x3_2, kernel_size=3, stride=1, padding=1)
)
# 分支-3
self.branch_3 = Sequential(
YQConv2d(in_channels=in_channels, out_channels=ch5x5_1, kernel_size=1, stride=1, padding=0),
YQConv2d(in_channels=ch5x5_1, out_channels=ch5x5_2, kernel_size=3, stride=1, padding=1)
)
# 分支-4
self.branch_4 = Sequential(
MaxPool2d(kernel_size=3, stride=1, padding=1, ceil_mode=True),
YQConv2d(in_channels=in_channels, out_channels=ch_pool, kernel_size=1, stride=1, padding=0)
)
def forward(self, x):
b_y1 = self.branch_1(x)
b_y2 = self.branch_2(x)
b_y3 = self.branch_3(x)
b_y4 = self.branch_4(x)
y_ = torch.cat([b_y1, b_y2, b_y3, b_y4], 1) # 1 表示按照列链接,就是行堆叠在一起。
return y_
# 辅助分类器实现
# --------------------------------------------------------------
class YQAuxClassifier(Module):
def __init__(self, in_channels, num_classes):
super(YQAuxClassifier, self).__init__()
# 池化/卷积/全连接/全连接
self.pool = AdaptiveAvgPool2d((4, 4))
self.conv = YQConv2d(in_channels=in_channels, out_channels=128, kernel_size=1, stride=1, padding=0)
self.fc_1 = Linear(2048, 1024)
self.relu = ReLU(inplace=True)
self.drop = Dropout(p=0.7, inplace=False)
self.fc_2 = Linear(1024, num_classes)
def forward(self, x):
y_ = self.pool(x)
y_ = self.conv(y_)
y_ = torch.flatten(y_, 1) # 卷积到连接层的数据维度转换。
y_ = self.fc_1(y_)
y_ = self.relu(y_)
y_ = self.drop(y_)
y_ = self.fc_2(y_)
return y_
# GoogLeNet网络实现
# --------------------------------------------------------------
class YQGoogLeNet(Module):
def __init__(self, num_classes=1000):
super(YQGoogLeNet, self).__init__()
# 定义层
self.conv_1 = YQConv2d(in_channels=3, out_channels=64, kernel_size=7, stride=2, padding=3)
self.pool_1 = MaxPool2d(3, stride=2, ceil_mode=True)
self.conv_2 = YQConv2d(in_channels=64, out_channels=64, kernel_size=1, stride=1, padding=0)
self.conv_3 = YQConv2d(in_channels=64, out_channels=192, kernel_size=3, stride=1, padding=1)
self.pool_2 = MaxPool2d(3, stride=2, ceil_mode=True)
self.ince_3a = YQInception(192, 64, 96, 128, 16, 32, 32)
self.ince_3b = YQInception(256, 128, 128, 192, 32, 96, 64)
self.pool_3 = MaxPool2d(3, stride=2, ceil_mode=True)
self.ince_4a = YQInception(480, 192, 96, 208, 16, 48, 64)
self.ince_4b = YQInception(512, 160, 112, 224, 24, 64, 64)
self.ince_4c = YQInception(512, 128, 128, 256, 24, 64, 64)
self.ince_4d = YQInception(512, 112, 144, 288, 32, 64, 64)
self.ince_4e = YQInception(528, 256, 160, 320, 32, 128, 128)
self.pool_4 = MaxPool2d(2, stride=2, ceil_mode=True)
self.ince_5a = YQInception(832, 256, 160, 320, 32, 128, 128)
self.ince_5b = YQInception(832, 384, 192, 384, 48, 128, 128)
# 两个辅助分类器
self.auxi_1 = YQAuxClassifier(512, num_classes)
self.auxi_2 = YQAuxClassifier(528, num_classes)
# 结尾的分类层
self.pool_5 = AdaptiveAvgPool2d((1, 1))
self.drop = Dropout(0.2)
self.full = Linear(1024, num_classes)
def forward(self, x):
# -----------------------------
y_ = self.conv_1(x)
y_ = self.pool_1(y_)
# -----------------------------
y_ = self.conv_2(y_)
y_ = self.conv_3(y_)
y_ = self.pool_2(y_)
# -----------------------------
y_ = self.ince_3a(y_)
y_ = self.ince_3b(y_)
y_ = self.pool_3(y_)
# -----------------------------
y_ = self.ince_4a(y_)
# -----------------------------
a1 = self.auxi_1(y_)
# -----------------------------
y_ = self.ince_4b(y_)
y_ = self.ince_4c(y_)
y_ = self.ince_4d(y_)
# -----------------------------
a2 = self.auxi_2(y_)
# -----------------------------
y_ = self.ince_4e(y_)
y_ = self.pool_4(y_)
# -----------------------------
y_ = self.ince_5a(y_)
y_ = self.ince_5b(y_)
# -----------------------------
y_ = self.pool_5(y_)
y_ = torch.flatten(y_, 1)
y_ = self.drop(y_)
y_ = self.full(y_)
return y_, a1, a2
# 数据加载
# --------------------------------------------------------------
def load_data(img_dir, rate=0.8):
transform = Compose(
[
Resize((224, 224)), #RandomResizedCrop(224),
# RandomHorizontalFlip(),
ToTensor(),
Normalize(mean=[0.0, 0.0, 0.0], std=[1.0, 1.0, 1.0]), # 均值与方差,(这个运算输入必须是Tensor图像,所以需要在ToTensor后)
]
)
ds = ImageFolder(img_dir, transform=transform)
l = len(ds)
l_train = int(l * rate)
train, test = random_split(ds, [l_train, l - l_train])
train_loader = torch.utils.data.DataLoader(dataset=train, shuffle=True, batch_size=50) # 100,因为每个类的图像是1300个
test_loader = torch.utils.data.DataLoader(dataset=test, shuffle=True, batch_size=50) # 一个批次直接预测
return train_loader, test_loader
# 训练实现
# ==============================================================
# 1. 加载数据集
print("1. 加载数据集")
train_loader, test_loader = load_data("./imagenet2012", 0.8)
CUDA = torch.cuda.is_available()
# 2. 网络搭建
print("2. 网络搭建")
net=YQGoogLeNet(4)
if CUDA:
net.cuda()
# 3. 训练
print("3. 训练")
optimizer=torch.optim.Adam(net.parameters(),lr=0.001)
loss_F=torch.nn.CrossEntropyLoss()
epoch = 10
for n in range(epoch): # 数据集只迭代一次
for step, input_data in enumerate(train_loader):
x_, y_=input_data
if CUDA:
# GPU运算 -----------------------------------------------
x_ = x_.cuda()
y_ = y_.cuda()
pred, pred_aux_1, pred_aux_2 =net(x_.view(-1, 3, 224, 224))
loss = loss_F(pred, y_) # 计算loss
loss_aux_1 = loss_F(pred_aux_1, y_)
loss_aux_2 = loss_F(pred_aux_2, y_)
last_loss = loss + 0.3 * (loss_aux_1 + loss_aux_2) # 辅助分类器的误差按照30%的比例使用
optimizer.zero_grad()
last_loss.backward()
optimizer.step()
with torch.no_grad():
all_num = 0.0
acc = 0.0
for t_x, t_y in test_loader:
all_num += len(t_y)
if CUDA:
t_x = t_x.cuda()
t_y = t_y.cuda()
#
test_pred, _, _=net(t_x.view(-1, 3, 224, 224)) # 辅助分类器的返回结果不需要
prob=torch.nn.functional.softmax(test_pred, dim=1)
pred_cls=torch.argmax(prob, dim=1)
acc += (pred_cls == t_y).float().sum()
print(f"轮数/批次:{n:02d}/{step:02d}: \t识别正确率:{acc/all_num *100:6.4f}, 损失值:{last_loss:6.4f}")
# print(f"轮数:{n+1:02d}: \t识别正确率:{acc/all_num *100:6.4f}, \t损失值:{loss:6.4f}")
# 保存模型
torch.save(net.state_dict(), "./googlenet.models") # GPU保存
1. 加载数据集
2. 网络搭建
3. 训练
轮数/批次:00/00: 识别正确率:34.5566, 损失值:2.2265
轮数/批次:00/01: 识别正确率:33.2314, 损失值:2.1856
轮数/批次:00/02: 识别正确率:44.8522, 损失值:3.0660
轮数/批次:00/03: 识别正确率:40.4689, 损失值:2.3589
......
轮数/批次:09/75: 识别正确率:80.8359, 损失值:0.8001
轮数/批次:09/76: 识别正确率:79.6126, 损失值:0.3436
轮数/批次:09/77: 识别正确率:81.0397, 损失值:0.7516
轮数/批次:09/78: 识别正确率:81.2436, 损失值:0.3100
Inception V2
- 输入图像为:
卷积分解
卷积分解设计
-
Inception V2使用小核卷积替代大核卷积,并减少计算量
- 卷积使用2个的卷积替代。
- V2早期与V1的区别是就是增加了BatchNorm2d,官方的说法是V1增强,后来发布了卷积分解,才正式称呼为V2
卷积分解示意图
深度分解
深度分解设计
-
深度分解的原因是传统的并联池化处理存在问题:(输出的通道数保持一个相等值。)
- 先卷积,再池化
- 增加卷积的计算量。
- 先池化,再卷积
- 会产生特征缺失的情况。
- 先卷积,再池化
- 深度分解的设计如下:
- 可以保持特征,同时降低计算量。
Inception V2的GoogLeNet网络结构
- 下图实际上是也可以成为去掉BatchNorm2d的V3版本。实际上也是V3(包含不对成的卷积分解)
- 注:
- 上表中的Figure 5指没有进化的Inception
- 上表中的Figure 6是指小卷积版的Inception(用3x3卷积核代替5x5卷积核),
- 上表中的Figure 7是指不对称版的Inception(用1xn、nx1卷积核代替nxn卷积核)。
附录
- Figure-5
- Figure-6
- Figure-7
Inception V3
不对称卷积分解设计
引入了不对称卷积分解:的卷积,并分解成 与 ,,具体的网络结构见V2中的网络结构。
图像的输入提升为
Inception V4
Inception残差设计
- 使用Inception做残差,改进Inception-v3,得到如下残差方式
- Inception-ResNet-v1,
- Inception-ResNet-v2,
- Inception-v4
- Inception残差设计
Inception V4 GoogLeNet网络结构
附录
-
关于Google发表的关于GoogLetNet的系列文章,在这些论文中对Inception v1、Inception v2、Inception v3、Inception v4 等思想和技术原理进行了详细的介绍:
- 《Going deeper with convolutions》
- 《Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift》
- 《Rethinking the Inception Architecture for Computer Vision》
- 《Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning》
-
官方的源代码还做了很多细节的处理,比如:
- 结构更加良好的代码组织;
- 权重提供初始化;
- 对不同结构的数据提供transform转换
-
V2, V3,V4的代码花点时间也可以撸出来。
- 其中V4可以参考我们实现的ResNet网络。