TORCH03-09GoogLeNet网络


  GoogLeNet有意思的是设计了辅助分类器,还有一个就是卷积拆分,这个在VGG中实际已经提出,这个主题梳理了GoogLeNet的分类技术,顺便实现了多辅助分类器的训练,效果在ImageNet2012数据集上还不错(我只是用4类没有打乱的图像训练)。
  要想跟上最新的技术,现在看书不顶用了,得看最新学术论文。


关于GoogLeNet网络

  • 2014年,GoogLeNet和VGG是当年ImageNet挑战赛(ILSVRC14)的双雄,GoogLeNet获得了第一名、VGG获得了第二名,这两类模型结构的共同特点是层次更深了。

    • VGG继承了LeNet以及AlexNet的一些框架结构,而GoogLeNet则做了更加大胆的网络结构尝试
    • 内存或计算资源有限时,GoogleNet是比较好的选择;从模型结果来看,GoogLeNet的性能却更加优越
      • GoogLeNet虽然深度只有22层,但大小却比AlexNet和VGG小很多;
      • GoogleNet参数为500万个,AlexNet参数个数是GoogleNet的12倍,VGGNet参数又是AlexNet的3倍。
  • GoogLeNet是谷歌(Google)研究出来的深度网络结构,为什么不叫"GoogleNet",而叫"GoogLeNet",据说是为了向"LeNet"致敬,因此取名为"GoogLeNet"

  • 参考资料:

    • https://my.oschina.net/u/876354/blog/1637819

GoogLeNet的设计思想

  • GoogLeNet的核心是提出了Inception(开端)的概念与设计,并且逐步优化设计了4个版本。
    1. Inception V1
    2. Inception V2
    3. Inception V3
    4. Inception V4

Inception的核心思想

  • Inception的核心思想是堆叠
    • 通过设计一个稀疏网络结构,但是能够产生稠密的数据;
    • 既能增加神经网络表现,又能保证计算资源的使用效率。
Inception的设计思想示意图
  • Inception使用3个卷积 + 1个池化运算堆叠而成

    1. 卷积:(1 \times 1, 3 \times 3, 5 \times 5)
    2. 池化:(3 \times 3)最大池化;
    3. 每个卷积都使用ReLU函数激活;
    4. 输出后叠加,会增加你输出的通道(深度)
  • Inception设计的分析

    • 增加了网络的宽度:
      • 3个卷积运算构成的稀疏网络能够提取输入的每一个细节信息,增加提供网络;
    • 降低过度拟合:
      • 池化运算减少空间大小,降低过度拟合(有点类似Dropout,只是不是随机的)
    • 增加了网络对尺度的适应性:
      • 三个不同大小的卷积可以增加网络对尺度的适应性;

Inception V1

Inception V1设计

  • V1在上面Inception基础上,做了运算优化, 增加了一个1 \times 1的卷积运算。
Inception V1
  • 理解堆叠的概念
GoogLeNet堆叠的示意图
  • 1 \times 1 卷积运算的分析

    1. 可以提供一个ReLU运算,增加非线性性。
    2. 因为上层叠加,所以每层的深度或者通道数会越来越大,可以增加1 \times 1用来减少深度或者通道,这样在做3 \times 35 \times 5的卷积运算时,可以降低运算量。
  • 注意:

    • 注意理解增加一个1 \times 1卷积为什么会减少运算量?
      1. 假设输入:100 \times 100 \times 128,假设卷积核为: 5 \time 5 \times 256,步长为1。
      2. 不增加1 \times 1卷积:
        • 权重参数:128 \times 5 \times 5 \times 256 = 819200
      3. 增加1 \times 1卷积(层数假设为32):
        • 权重参数:128 \times 1 \times 1 \times 32 + 32 \times 5 \times 5 \times 256 = 204800
      4. 说明:
        • 增加1 \times 1卷积后计算量是原来的\dfrac{1}{4}

GoogLeNet 22层网络结构

GoogLet 22层网络结构

Torch的GoogLeNet网络实现

  • Torch实现了GoogLeNet网络
    • 从源代码与输出的结构看,官方的实现应该是Inception V1, 增加了BatchNorm2d,算是增强V1
from torchvision.models import GoogLeNet
GoogLeNet?
�[1;31mInit signature:�[0m
�[0mGoogLeNet�[0m�[1;33m(�[0m�[1;33m
�[0m    �[0mnum_classes�[0m�[1;33m=�[0m�[1;36m1000�[0m�[1;33m,�[0m�[1;33m
�[0m    �[0maux_logits�[0m�[1;33m=�[0m�[1;32mTrue�[0m�[1;33m,�[0m�[1;33m
�[0m    �[0mtransform_input�[0m�[1;33m=�[0m�[1;32mFalse�[0m�[1;33m,�[0m�[1;33m
�[0m    �[0minit_weights�[0m�[1;33m=�[0m�[1;32mTrue�[0m�[1;33m,�[0m�[1;33m
�[0m    �[0mblocks�[0m�[1;33m=�[0m�[1;32mNone�[0m�[1;33m,�[0m�[1;33m
�[0m�[1;33m)�[0m�[1;33m�[0m�[0m
�[1;31mDocstring:�[0m     
Base class for all neural network modules.

Your models should also subclass this class.

Modules can also contain other Modules, allowing to nest them in
a tree structure. You can assign the submodules as regular attributes::

    import torch.nn as nn
    import torch.nn.functional as F

    class Model(nn.Module):
        def __init__(self):
            super(Model, self).__init__()
            self.conv1 = nn.Conv2d(1, 20, 5)
            self.conv2 = nn.Conv2d(20, 20, 5)

        def forward(self, x):
            x = F.relu(self.conv1(x))
            return F.relu(self.conv2(x))

Submodules assigned in this way will be registered, and will have their
parameters converted too when you call :meth:`to`, etc.
�[1;31mInit docstring:�[0m Initializes internal Module state, shared by both nn.Module and ScriptModule.
�[1;31mFile:�[0m           c:\program files\python36\lib\site-packages\torchvision\models\googlenet.py
�[1;31mType:�[0m           type
�[1;31mSubclasses:�[0m     QuantizableGoogLeNet
  • aux_logits用来控制中间的两个分类输出

  • GoogLeNet网络的输入仍然是

    • 224 \times 224 \times 3
from torchvision.models import GoogLeNet
from torchsummary import summary
net = GoogLeNet()
# print(net)
# print("=========================================================")
# 输出网络结构
print(summary(net,input_size=(3, 224, 224), device='cpu'))
# print("=========================================================")
# # 输出网络结构
# print(summary(net.cuda(),input_size=(3, 224, 224)))
----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
================================================================
            Conv2d-1         [-1, 64, 112, 112]           9,408
       BatchNorm2d-2         [-1, 64, 112, 112]             128
       BasicConv2d-3         [-1, 64, 112, 112]               0
         MaxPool2d-4           [-1, 64, 56, 56]               0
            Conv2d-5           [-1, 64, 56, 56]           4,096
       BatchNorm2d-6           [-1, 64, 56, 56]             128
       BasicConv2d-7           [-1, 64, 56, 56]               0
            Conv2d-8          [-1, 192, 56, 56]         110,592
       BatchNorm2d-9          [-1, 192, 56, 56]             384
      BasicConv2d-10          [-1, 192, 56, 56]               0
        MaxPool2d-11          [-1, 192, 28, 28]               0
           Conv2d-12           [-1, 64, 28, 28]          12,288
      BatchNorm2d-13           [-1, 64, 28, 28]             128
      BasicConv2d-14           [-1, 64, 28, 28]               0
           Conv2d-15           [-1, 96, 28, 28]          18,432
      BatchNorm2d-16           [-1, 96, 28, 28]             192
      BasicConv2d-17           [-1, 96, 28, 28]               0
           Conv2d-18          [-1, 128, 28, 28]         110,592
      BatchNorm2d-19          [-1, 128, 28, 28]             256
      BasicConv2d-20          [-1, 128, 28, 28]               0
           Conv2d-21           [-1, 16, 28, 28]           3,072
      BatchNorm2d-22           [-1, 16, 28, 28]              32
      BasicConv2d-23           [-1, 16, 28, 28]               0
           Conv2d-24           [-1, 32, 28, 28]           4,608
      BatchNorm2d-25           [-1, 32, 28, 28]              64
      BasicConv2d-26           [-1, 32, 28, 28]               0
        MaxPool2d-27          [-1, 192, 28, 28]               0
           Conv2d-28           [-1, 32, 28, 28]           6,144
      BatchNorm2d-29           [-1, 32, 28, 28]              64
      BasicConv2d-30           [-1, 32, 28, 28]               0
        Inception-31          [-1, 256, 28, 28]               0
           Conv2d-32          [-1, 128, 28, 28]          32,768
      BatchNorm2d-33          [-1, 128, 28, 28]             256
      BasicConv2d-34          [-1, 128, 28, 28]               0
           Conv2d-35          [-1, 128, 28, 28]          32,768
      BatchNorm2d-36          [-1, 128, 28, 28]             256
      BasicConv2d-37          [-1, 128, 28, 28]               0
           Conv2d-38          [-1, 192, 28, 28]         221,184
      BatchNorm2d-39          [-1, 192, 28, 28]             384
      BasicConv2d-40          [-1, 192, 28, 28]               0
           Conv2d-41           [-1, 32, 28, 28]           8,192
      BatchNorm2d-42           [-1, 32, 28, 28]              64
      BasicConv2d-43           [-1, 32, 28, 28]               0
           Conv2d-44           [-1, 96, 28, 28]          27,648
      BatchNorm2d-45           [-1, 96, 28, 28]             192
      BasicConv2d-46           [-1, 96, 28, 28]               0
        MaxPool2d-47          [-1, 256, 28, 28]               0
           Conv2d-48           [-1, 64, 28, 28]          16,384
      BatchNorm2d-49           [-1, 64, 28, 28]             128
      BasicConv2d-50           [-1, 64, 28, 28]               0
        Inception-51          [-1, 480, 28, 28]               0
        MaxPool2d-52          [-1, 480, 14, 14]               0
           Conv2d-53          [-1, 192, 14, 14]          92,160
      BatchNorm2d-54          [-1, 192, 14, 14]             384
      BasicConv2d-55          [-1, 192, 14, 14]               0
           Conv2d-56           [-1, 96, 14, 14]          46,080
      BatchNorm2d-57           [-1, 96, 14, 14]             192
      BasicConv2d-58           [-1, 96, 14, 14]               0
           Conv2d-59          [-1, 208, 14, 14]         179,712
      BatchNorm2d-60          [-1, 208, 14, 14]             416
      BasicConv2d-61          [-1, 208, 14, 14]               0
           Conv2d-62           [-1, 16, 14, 14]           7,680
      BatchNorm2d-63           [-1, 16, 14, 14]              32
      BasicConv2d-64           [-1, 16, 14, 14]               0
           Conv2d-65           [-1, 48, 14, 14]           6,912
      BatchNorm2d-66           [-1, 48, 14, 14]              96
      BasicConv2d-67           [-1, 48, 14, 14]               0
        MaxPool2d-68          [-1, 480, 14, 14]               0
           Conv2d-69           [-1, 64, 14, 14]          30,720
      BatchNorm2d-70           [-1, 64, 14, 14]             128
      BasicConv2d-71           [-1, 64, 14, 14]               0
        Inception-72          [-1, 512, 14, 14]               0
           Conv2d-73            [-1, 128, 4, 4]          65,536
      BatchNorm2d-74            [-1, 128, 4, 4]             256
      BasicConv2d-75            [-1, 128, 4, 4]               0
           Linear-76                 [-1, 1024]       2,098,176
           Linear-77                 [-1, 1000]       1,025,000
     InceptionAux-78                 [-1, 1000]               0
           Conv2d-79          [-1, 160, 14, 14]          81,920
      BatchNorm2d-80          [-1, 160, 14, 14]             320
      BasicConv2d-81          [-1, 160, 14, 14]               0
           Conv2d-82          [-1, 112, 14, 14]          57,344
      BatchNorm2d-83          [-1, 112, 14, 14]             224
      BasicConv2d-84          [-1, 112, 14, 14]               0
           Conv2d-85          [-1, 224, 14, 14]         225,792
      BatchNorm2d-86          [-1, 224, 14, 14]             448
      BasicConv2d-87          [-1, 224, 14, 14]               0
           Conv2d-88           [-1, 24, 14, 14]          12,288
      BatchNorm2d-89           [-1, 24, 14, 14]              48
      BasicConv2d-90           [-1, 24, 14, 14]               0
           Conv2d-91           [-1, 64, 14, 14]          13,824
      BatchNorm2d-92           [-1, 64, 14, 14]             128
      BasicConv2d-93           [-1, 64, 14, 14]               0
        MaxPool2d-94          [-1, 512, 14, 14]               0
           Conv2d-95           [-1, 64, 14, 14]          32,768
      BatchNorm2d-96           [-1, 64, 14, 14]             128
      BasicConv2d-97           [-1, 64, 14, 14]               0
        Inception-98          [-1, 512, 14, 14]               0
           Conv2d-99          [-1, 128, 14, 14]          65,536
     BatchNorm2d-100          [-1, 128, 14, 14]             256
     BasicConv2d-101          [-1, 128, 14, 14]               0
          Conv2d-102          [-1, 128, 14, 14]          65,536
     BatchNorm2d-103          [-1, 128, 14, 14]             256
     BasicConv2d-104          [-1, 128, 14, 14]               0
          Conv2d-105          [-1, 256, 14, 14]         294,912
     BatchNorm2d-106          [-1, 256, 14, 14]             512
     BasicConv2d-107          [-1, 256, 14, 14]               0
          Conv2d-108           [-1, 24, 14, 14]          12,288
     BatchNorm2d-109           [-1, 24, 14, 14]              48
     BasicConv2d-110           [-1, 24, 14, 14]               0
          Conv2d-111           [-1, 64, 14, 14]          13,824
     BatchNorm2d-112           [-1, 64, 14, 14]             128
     BasicConv2d-113           [-1, 64, 14, 14]               0
       MaxPool2d-114          [-1, 512, 14, 14]               0
          Conv2d-115           [-1, 64, 14, 14]          32,768
     BatchNorm2d-116           [-1, 64, 14, 14]             128
     BasicConv2d-117           [-1, 64, 14, 14]               0
       Inception-118          [-1, 512, 14, 14]               0
          Conv2d-119          [-1, 112, 14, 14]          57,344
     BatchNorm2d-120          [-1, 112, 14, 14]             224
     BasicConv2d-121          [-1, 112, 14, 14]               0
          Conv2d-122          [-1, 144, 14, 14]          73,728
     BatchNorm2d-123          [-1, 144, 14, 14]             288
     BasicConv2d-124          [-1, 144, 14, 14]               0
          Conv2d-125          [-1, 288, 14, 14]         373,248
     BatchNorm2d-126          [-1, 288, 14, 14]             576
     BasicConv2d-127          [-1, 288, 14, 14]               0
          Conv2d-128           [-1, 32, 14, 14]          16,384
     BatchNorm2d-129           [-1, 32, 14, 14]              64
     BasicConv2d-130           [-1, 32, 14, 14]               0
          Conv2d-131           [-1, 64, 14, 14]          18,432
     BatchNorm2d-132           [-1, 64, 14, 14]             128
     BasicConv2d-133           [-1, 64, 14, 14]               0
       MaxPool2d-134          [-1, 512, 14, 14]               0
          Conv2d-135           [-1, 64, 14, 14]          32,768
     BatchNorm2d-136           [-1, 64, 14, 14]             128
     BasicConv2d-137           [-1, 64, 14, 14]               0
       Inception-138          [-1, 528, 14, 14]               0
          Conv2d-139            [-1, 128, 4, 4]          67,584
     BatchNorm2d-140            [-1, 128, 4, 4]             256
     BasicConv2d-141            [-1, 128, 4, 4]               0
          Linear-142                 [-1, 1024]       2,098,176
          Linear-143                 [-1, 1000]       1,025,000
    InceptionAux-144                 [-1, 1000]               0
          Conv2d-145          [-1, 256, 14, 14]         135,168
     BatchNorm2d-146          [-1, 256, 14, 14]             512
     BasicConv2d-147          [-1, 256, 14, 14]               0
          Conv2d-148          [-1, 160, 14, 14]          84,480
     BatchNorm2d-149          [-1, 160, 14, 14]             320
     BasicConv2d-150          [-1, 160, 14, 14]               0
          Conv2d-151          [-1, 320, 14, 14]         460,800
     BatchNorm2d-152          [-1, 320, 14, 14]             640
     BasicConv2d-153          [-1, 320, 14, 14]               0
          Conv2d-154           [-1, 32, 14, 14]          16,896
     BatchNorm2d-155           [-1, 32, 14, 14]              64
     BasicConv2d-156           [-1, 32, 14, 14]               0
          Conv2d-157          [-1, 128, 14, 14]          36,864
     BatchNorm2d-158          [-1, 128, 14, 14]             256
     BasicConv2d-159          [-1, 128, 14, 14]               0
       MaxPool2d-160          [-1, 528, 14, 14]               0
          Conv2d-161          [-1, 128, 14, 14]          67,584
     BatchNorm2d-162          [-1, 128, 14, 14]             256
     BasicConv2d-163          [-1, 128, 14, 14]               0
       Inception-164          [-1, 832, 14, 14]               0
       MaxPool2d-165            [-1, 832, 7, 7]               0
          Conv2d-166            [-1, 256, 7, 7]         212,992
     BatchNorm2d-167            [-1, 256, 7, 7]             512
     BasicConv2d-168            [-1, 256, 7, 7]               0
          Conv2d-169            [-1, 160, 7, 7]         133,120
     BatchNorm2d-170            [-1, 160, 7, 7]             320
     BasicConv2d-171            [-1, 160, 7, 7]               0
          Conv2d-172            [-1, 320, 7, 7]         460,800
     BatchNorm2d-173            [-1, 320, 7, 7]             640
     BasicConv2d-174            [-1, 320, 7, 7]               0
          Conv2d-175             [-1, 32, 7, 7]          26,624
     BatchNorm2d-176             [-1, 32, 7, 7]              64
     BasicConv2d-177             [-1, 32, 7, 7]               0
          Conv2d-178            [-1, 128, 7, 7]          36,864
     BatchNorm2d-179            [-1, 128, 7, 7]             256
     BasicConv2d-180            [-1, 128, 7, 7]               0
       MaxPool2d-181            [-1, 832, 7, 7]               0
          Conv2d-182            [-1, 128, 7, 7]         106,496
     BatchNorm2d-183            [-1, 128, 7, 7]             256
     BasicConv2d-184            [-1, 128, 7, 7]               0
       Inception-185            [-1, 832, 7, 7]               0
          Conv2d-186            [-1, 384, 7, 7]         319,488
     BatchNorm2d-187            [-1, 384, 7, 7]             768
     BasicConv2d-188            [-1, 384, 7, 7]               0
          Conv2d-189            [-1, 192, 7, 7]         159,744
     BatchNorm2d-190            [-1, 192, 7, 7]             384
     BasicConv2d-191            [-1, 192, 7, 7]               0
          Conv2d-192            [-1, 384, 7, 7]         663,552
     BatchNorm2d-193            [-1, 384, 7, 7]             768
     BasicConv2d-194            [-1, 384, 7, 7]               0
          Conv2d-195             [-1, 48, 7, 7]          39,936
     BatchNorm2d-196             [-1, 48, 7, 7]              96
     BasicConv2d-197             [-1, 48, 7, 7]               0
          Conv2d-198            [-1, 128, 7, 7]          55,296
     BatchNorm2d-199            [-1, 128, 7, 7]             256
     BasicConv2d-200            [-1, 128, 7, 7]               0
       MaxPool2d-201            [-1, 832, 7, 7]               0
          Conv2d-202            [-1, 128, 7, 7]         106,496
     BatchNorm2d-203            [-1, 128, 7, 7]             256
     BasicConv2d-204            [-1, 128, 7, 7]               0
       Inception-205           [-1, 1024, 7, 7]               0
AdaptiveAvgPool2d-206           [-1, 1024, 1, 1]               0
         Dropout-207                 [-1, 1024]               0
          Linear-208                 [-1, 1000]       1,025,000
================================================================
Total params: 13,004,888
Trainable params: 13,004,888
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.57
Forward/backward pass size (MB): 94.25
Params size (MB): 49.61
Estimated Total Size (MB): 144.43
----------------------------------------------------------------
None

手工实现GoogLeNet网络结构

GoogLeNet施工参数细节

  • 下面略掉了ReLU与BatchNorm2d,这两个操作对数据大小与格式没有影响。
  1. 输入图像

    • \color{red}{3 \times 224 \times 224}
    • 要求图像去中心化处理(均值化为0处理)
    • 格式:NCHW
  2. 卷积层-1

    • 输入: \color{red}{3 \times 224 \times 224}
    • 卷积核:kernel=(7 \times 7), stride = 2, channels = 64, padding=3
    • 输出: \color{red}{64 \times 112 \times 112}
    • 最大池化核:kernel=(3 \times 3), stride = 2
    • 输出:\color{red}{64 \times 56 \times 56}
  3. 卷积层-2

    • 输入:\color{red}{64 \times 56 \times 56}
    • 卷积核:kernel=(1 \times 1), stride = 1, channels = 64, padding=0
    • 输出:\color{red}{64 \times 56 \times 56}
    • 卷积核:kernel=(3 \times 3), stride = 1, channels = 192, padding=1
    • 输出:\color{red}{192 \times 56 \times 56}
    • 最大池化核:kernel=(3 \times 3), stride = 2
    • 输出:\color{red}{192 \times 28 \times 28}
  4. Inception(3a)

    • 输入:\color{red}{192 \times 28 \times 28}
    • 分支1
      • 卷积核:kernel=(1 \times 1), stride = 1, channels = 64, padding=0
      • 输出:\color{red}{64 \times 28 \times 28}
    • 分支2
      • 卷积核:kernel=(1 \times 1), stride = 1, channels = 96, padding=0
      • 卷积核:kernel=(3 \times 3), stride = 1, channels = 128, padding=1
      • 输出:\color{red}{128 \times 28 \times 28}
    • 分支3
      • 卷积核:kernel=(1 \times 1), stride = 1, channels = 16, padding=0
      • 卷积核:kernel=(3 \times 3), stride = 1, channels = 32, padding=1
      • 输出:\color{red}{32 \times 28 \times 28}
    • 分支4
      • 池化核:kernel=(3 \times 3), stride = 1, padding=1
      • 卷积核:kernel=(1 \times 1), stride = 1, channels = 32, padding=0
      • 输出:\color{red}{32 \times 28 \times 28}
    • 输出:\color{red}{256 \times 28 \times 28}
      • 256 = 64 + 128 + 32 + 32
  5. Inception(3b)

    • 输入:\color{red}{256 \times 28 \times 28}
    • 分支1
      • 卷积核:kernel=(1 \times 1), stride = 1, channels = 128, padding=0
      • 输出:\color{red}{128 \times 28 \times 28}
    • 分支2
      • 卷积核:kernel=(1 \times 1), stride = 1, channels = 128, padding=0
      • 卷积核:kernel=(3 \times 3), stride = 1, channels = 192, padding=1
      • 输出:\color{red}{192 \times 28 \times 28}
    • 分支3
      • 卷积核:kernel=(1 \times 1), stride = 1, channels = 32, padding=0
      • 卷积核:kernel=(3 \times 3), stride = 1, channels = 96, padding=1
      • 输出:\color{red}{96 \times 28 \times 28}
    • 分支4
      • 池化核:kernel=(3 \times 3), stride = 1, padding=1
      • 卷积核:kernel=(1 \times 1), stride = 1, channels = 64, padding=0
      • 输出:\color{red}{64 \times 28 \times 28}
    • 输出:\color{red}{480 \times 28 \times 28}
      • 480 = 128 + 192 + 96 + 64
    • 最大池化核:kernel=(3 \times 3), stride = 2, padding=0
    • 输出:\color{red}{480 \times 14 \times 14}
  6. Inception(4a)

    • 输入:\color{red}{480 \times 14 \times 14}
    • 分支1
      • 卷积核:kernel=(1 \times 1), stride = 1, channels = 192, padding=0
      • 输出:\color{red}{192 \times 14 \times 14}
    • 分支2
      • 卷积核:kernel=(1 \times 1), stride = 1, channels = 96, padding=0
      • 卷积核:kernel=(3 \times 3), stride = 1, channels = 208, padding=1
      • 输出:\color{red}{208 \times 14 \times 14}
    • 分支3
      • 卷积核:kernel=(1 \times 1), stride = 1, channels = 16, padding=0
      • 卷积核:kernel=(3 \times 3), stride = 1, channels = 48, padding=1
      • 输出:\color{red}{48 \times 14 \times 14}
    • 分支4
      • 池化核:kernel=(3 \times 3), stride = 1, padding=1
      • 卷积核:kernel=(1 \times 1), stride = 1, channels = 64, padding=0
      • 输出:\color{red}{64 \times 14 \times 14}
    • 输出:\color{red}{512 \times 14 \times 14}
      • 512 = 192 + 208 + 48 + 64
  7. 辅助分类器-1

    • 输入:\color{red}{512 \times 14 \times 14}
    • AdaptiveAvgPool2d:output_size=(4 \times 4)
    • 输出:\color{red}{512 \times 4 \times 4}
    • 卷积核:kernel=(1 \times 1), stride = 1, channels = 128, padding=0
    • 输出:\color{red}{128 \times 4 \times 4 = 2048}
    • 全连接:2048 \to 1024
    • 全连接:1024 \to nn是分类的类别;
    • 输出:\color{red}{n}
  8. Inception(4b)

    • 输入:\color{red}{512 \times 14 \times 14}
    • 分支1
      • 卷积核:kernel=(1 \times 1), stride = 1, channels = 160, padding=0
      • 输出:\color{red}{160 \times 14 \times 14}
    • 分支2
      • 卷积核:kernel=(1 \times 1), stride = 1, channels = 112, padding=0
      • 卷积核:kernel=(3 \times 3), stride = 1, channels = 224, padding=1
      • 输出:\color{red}{224 \times 14 \times 14}
    • 分支3
      • 卷积核:kernel=(1 \times 1), stride = 1, channels = 24, padding=0
      • 卷积核:kernel=(3 \times 3), stride = 1, channels = 64, padding=1
      • 输出:\color{red}{64 \times 14 \times 14}
    • 分支4
      • 池化核:kernel=(3 \times 3), stride = 1, padding=1
      • 卷积核:kernel=(1 \times 1), stride = 1, channels = 64, padding=0
      • 输出:\color{red}{64 \times 14 \times 14}
    • 输出:\color{red}{512 \times 14 \times 14}
      • 512 = 160 + 224 + 64 + 64
  9. Inception(4c)

    • 输入:\color{red}{512 \times 14 \times 14}
    • 分支1
      • 卷积核:kernel=(1 \times 1), stride = 1, channels = 128, padding=0
      • 输出:\color{red}{128 \times 14 \times 14}
    • 分支2
      • 卷积核:kernel=(1 \times 1), stride = 1, channels = 128, padding=0
      • 卷积核:kernel=(3 \times 3), stride = 1, channels = 256, padding=1
      • 输出:\color{red}{256 \times 14 \times 14}
    • 分支3
      • 卷积核:kernel=(1 \times 1), stride = 1, channels = 24, padding=0
      • 卷积核:kernel=(3 \times 3), stride = 1, channels = 64, padding=1
      • 输出:\color{red}{64 \times 14 \times 14}
    • 分支4
      • 池化核:kernel=(3 \times 3), stride = 1, padding=1
      • 卷积核:kernel=(1 \times 1), stride = 1, channels = 64, padding=0
      • 输出:\color{red}{64 \times 14 \times 14}
    • 输出:\color{red}{512 \times 14 \times 14}
      • 512 = 128 + 256 + 64 + 64
  10. Inception(4d)

    • 输入:\color{red}{512 \times 14 \times 14}
    • 分支1
      • 卷积核:kernel=(1 \times 1), stride = 1, channels = 112, padding=0
      • 输出:\color{red}{112 \times 14 \times 14}
    • 分支2
      • 卷积核:kernel=(1 \times 1), stride = 1, channels = 144, padding=0
      • 卷积核:kernel=(3 \times 3), stride = 1, channels = 288, padding=1
      • 输出:\color{red}{288 \times 14 \times 14}
    • 分支3
      • 卷积核:kernel=(1 \times 1), stride = 1, channels = 32, padding=0
      • 卷积核:kernel=(3 \times 3), stride = 1, channels = 64, padding=1
      • 输出:\color{red}{64 \times 14 \times 14}
    • 分支4
      • 池化核:kernel=(3 \times 3), stride = 1, padding=1
      • 卷积核:kernel=(1 \times 1), stride = 1, channels = 64, padding=0
      • 输出:\color{red}{64 \times 14 \times 14}
    • 输出:\color{red}{528 \times 14 \times 14}
      • 528 = 112 + 288 + 64 + 64
  11. 辅助分类器-2

    • 输入:\color{red}{528 \times 14 \times 14}
    • AdaptiveAvgPool2d:output_size=(4 \times 4)
    • 输出:\color{red}{512 \times 4 \times 4}
    • 卷积核:kernel=(1 \times 1), stride = 1, channels = 128, padding=0
    • 输出:\color{red}{128 \times 4 \times 4 = 2048}
    • 全连接:2048 \to 1024
    • 全连接:1024 \to nn是分类的类别;
    • 输出:\color{red}{n}
  12. Inception(4e)

    • 输入:\color{red}{528 \times 14 \times 14}
    • 分支1
      • 卷积核:kernel=(1 \times 1), stride = 1, channels = 256, padding=0
      • 输出:\color{red}{256 \times 14 \times 14}
    • 分支2
      • 卷积核:kernel=(1 \times 1), stride = 1, channels = 160, padding=0
      • 卷积核:kernel=(3 \times 3), stride = 1, channels = 320, padding=1
      • 输出:\color{red}{320 \times 14 \times 14}
    • 分支3
      • 卷积核:kernel=(1 \times 1), stride = 1, channels = 32, padding=0
      • 卷积核:kernel=(3 \times 3), stride = 1, channels = 128, padding=1
      • 输出:\color{red}{128 \times 14 \times 14}
    • 分支4
      • 池化核:kernel=(3 \times 3), stride = 1, padding=1
      • 卷积核:kernel=(1 \times 1), stride = 1, channels = 128, padding=0
      • 输出:\color{red}{128 \times 14 \times 14}
    • 输出:\color{red}{832 \times 14 \times 14}
      • 832 = 256 + 320 + 128 + 128
    • 最大池化核:kernel=(2 \times 2), stride = 2, padding=0
    • 输出:\color{red}{832 \times 7 \times 7}
  13. Inception(5a)

    • 输入:\color{red}{832 \times 7 \times 7}
    • 分支1
      • 卷积核:kernel=(1 \times 1), stride = 1, channels = 256, padding=0
      • 输出:\color{red}{256 \times 7 \times 7}
    • 分支2
      • 卷积核:kernel=(1 \times 1), stride = 1, channels = 160, padding=0
      • 卷积核:kernel=(3 \times 3), stride = 1, channels = 320, padding=1
      • 输出:\color{red}{320 \times 7 \times 7}
    • 分支3
      • 卷积核:kernel=(1 \times 1), stride = 1, channels = 32, padding=0
      • 卷积核:kernel=(3 \times 3), stride = 1, channels = 128, padding=1
      • 输出:\color{red}{128 \times 7 \times 7}
    • 分支4
      • 池化核:kernel=(3 \times 3), stride = 1, padding=1
      • 卷积核:kernel=(1 \times 1), stride = 1, channels = 128, padding=0
      • 输出:\color{red}{128 \times 7 \times 7}
    • 输出:\color{red}{832 \times 7 \times 7}
      • 832 = 256 + 320 + 128 + 128
  1. Inception(5b)

    • 输入:\color{red}{832 \times 7 \times 7}
    • 分支1
      • 卷积核:kernel=(1 \times 1), stride = 1, channels = 384, padding=0
      • 输出:\color{red}{256 \times 7 \times 7}
    • 分支2
      • 卷积核:kernel=(1 \times 1), stride = 1, channels = 192, padding=0
      • 卷积核:kernel=(3 \times 3), stride = 1, channels = 384, padding=1
      • 输出:\color{red}{320 \times 7 \times 7}
    • 分支3
      • 卷积核:kernel=(1 \times 1), stride = 1, channels = 48, padding=0
      • 卷积核:kernel=(3 \times 3), stride = 1, channels = 128, padding=1
      • 输出:\color{red}{128 \times 7 \times 7}
    • 分支4
      • 池化核:kernel=(3 \times 3), stride = 1, padding=1
      • 卷积核:kernel=(1 \times 1), stride = 1, channels = 128, padding=0
      • 输出:\color{red}{128 \times 7 \times 7}
    • 输出:\color{red}{1024 \times 7 \times 7}
      • 1024 = 384 + 384 + 128 + 128
    • AdaptiveAvgPool2d:
    • 输出:\color{red}{1024 \times 1 \times 1}
  2. 全连接层

    • 1024 \to n
      • n表示类别数

基本的卷积封装

  • Inception V1的增强核心是BatchNorm2d,每个卷积运算都会包含如下两个运算,所以做一个封装
    1. BatchNorm2d
    2. relu
import torch
from torch.nn import Conv2d, BatchNorm2d, Module, ReLU

class YQConv2d(Module):
    
    # 构造器,初始化Conv2d, BatchNorm2d与ReLU
    def __init__(self, in_channels, out_channels, kernel_size=1, stride=1, padding=0):
        super(YQConv2d, self).__init__()
        # 卷积层
        self.conv = Conv2d(in_channels, out_channels, kernel_size, stride, padding, bias=False)
        # BatchNorm2d
        self.bn = BatchNorm2d(out_channels, eps=0.001)
        # 激活
        self.relu = ReLU(inplace=True)

    def forward(self, x):
        y_ = self.conv(x)
        y_ = self.bn(y_)
        y_ = self.relu(y_)
        return y_

  • 卷积的封装的结构
from torchsummary import summary
conv = YQConv2d(3, 64)
# 输出网络结构
print(summary(conv,input_size=(3, 224, 224), device='cpu'))
----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
================================================================
            Conv2d-1         [-1, 64, 224, 224]             192
       BatchNorm2d-2         [-1, 64, 224, 224]             128
              ReLU-3         [-1, 64, 224, 224]               0
================================================================
Total params: 320
Trainable params: 320
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.57
Forward/backward pass size (MB): 73.50
Params size (MB): 0.00
Estimated Total Size (MB): 74.08
----------------------------------------------------------------
None

Inception封装

  • 使用上面的卷积作为核心单元,封装Inception
Inception结构
import torch
from torch.nn import Module, Sequential, MaxPool2d
class YQInception(Module):
    # 构造器设置4个分支的参数(输出的通道数)
    def __init__(self, in_channels, ch1x1, ch3x3_1, ch3x3_2, ch5x5_1, ch5x5_2, ch_pool):
        super(YQInception, self).__init__()
        # YQConv2d(self, in_channels, out_channels, kernel_size=1, stride=1, padding=0):
        # 分支-1
        self.branch_1 = Sequential(
            YQConv2d(in_channels=in_channels, out_channels=ch1x1, kernel_size=1, stride=1, padding=0)
        )
        # 分支-2
        self.branch_2 = Sequential(
            YQConv2d(in_channels=in_channels, out_channels=ch3x3_1, kernel_size=1, stride=1, padding=0),
            YQConv2d(in_channels=ch3x3_1,     out_channels=ch3x3_2, kernel_size=3, stride=1, padding=1)
        )
        # 分支-3
        self.branch_3 = Sequential(
            YQConv2d(in_channels=in_channels, out_channels=ch5x5_1, kernel_size=1, stride=1, padding=0),
            YQConv2d(in_channels=ch5x5_1,     out_channels=ch5x5_2, kernel_size=3, stride=1, padding=1)
        )
        # 分支-4
        self.branch_4 = Sequential(
            MaxPool2d(kernel_size=3, stride=1, padding=1, ceil_mode=True),
            YQConv2d(in_channels=in_channels, out_channels=ch_pool, kernel_size=1, stride=1, padding=0)
        )
    
    def forward(self, x):
        b_y1 = self.branch_1(x)
        b_y2 = self.branch_2(x)
        b_y3 = self.branch_3(x)
        b_y4 = self.branch_4(x)
        
        y_ = torch.cat([b_y1, b_y2, b_y3, b_y4],  1)   # 1 表示按照列链接,就是行堆叠在一起。
        return y_
  • 输出下网络结构
from torchsummary import summary
conv = YQInception(3, 64, 96, 128, 16, 32, 32)   # 最后输出的深度是64 + 128 + 32 + 32
# 输出网络结构
print(summary(conv,input_size=(3, 224, 224), device='cpu'))
----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
================================================================
            Conv2d-1         [-1, 64, 224, 224]             192
       BatchNorm2d-2         [-1, 64, 224, 224]             128
              ReLU-3         [-1, 64, 224, 224]               0
          YQConv2d-4         [-1, 64, 224, 224]               0
            Conv2d-5         [-1, 96, 224, 224]             288
       BatchNorm2d-6         [-1, 96, 224, 224]             192
              ReLU-7         [-1, 96, 224, 224]               0
          YQConv2d-8         [-1, 96, 224, 224]               0
            Conv2d-9        [-1, 128, 224, 224]         110,592
      BatchNorm2d-10        [-1, 128, 224, 224]             256
             ReLU-11        [-1, 128, 224, 224]               0
         YQConv2d-12        [-1, 128, 224, 224]               0
           Conv2d-13         [-1, 16, 224, 224]              48
      BatchNorm2d-14         [-1, 16, 224, 224]              32
             ReLU-15         [-1, 16, 224, 224]               0
         YQConv2d-16         [-1, 16, 224, 224]               0
           Conv2d-17         [-1, 32, 224, 224]           4,608
      BatchNorm2d-18         [-1, 32, 224, 224]              64
             ReLU-19         [-1, 32, 224, 224]               0
         YQConv2d-20         [-1, 32, 224, 224]               0
        MaxPool2d-21          [-1, 3, 224, 224]               0
           Conv2d-22         [-1, 32, 224, 224]              96
      BatchNorm2d-23         [-1, 32, 224, 224]              64
             ReLU-24         [-1, 32, 224, 224]               0
         YQConv2d-25         [-1, 32, 224, 224]               0
================================================================
Total params: 116,560
Trainable params: 116,560
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.57
Forward/backward pass size (MB): 564.65
Params size (MB): 0.44
Estimated Total Size (MB): 565.67
----------------------------------------------------------------
None

辅助(auxiliary)分类器封装

  • 辅助分类器一共可训练层是3层.
辅助分类器
import torch
from torch.nn import Module, AdaptiveAvgPool2d, Linear, ReLU, Dropout

class YQAuxClassifier(Module):
    def __init__(self, in_channels, num_classes):
        super(YQAuxClassifier, self).__init__()
        # 池化/卷积/全连接/全连接
        self.pool = AdaptiveAvgPool2d((4, 4))
        self.conv = YQConv2d(in_channels=in_channels, out_channels=128, kernel_size=1, stride=1, padding=0)
        self.fc_1 = Linear(2048, 1024)
        self.relu = ReLU(inplace=True)
        self.drop = Dropout(p=0.7, inplace=False)
        self.fc_2 = Linear(1024, num_classes)
    
    def forward(self, x):
        y_ = self.pool(x)
        y_ = self.conv(y_)
        y_ = torch.flatten(y_, 1)    # 卷积到连接层的数据维度转换。
        y_ = self.fc_1(y_)
        y_ = self.relu(y_)
        y_ = self.drop(y_)
        y_ = self.fc_2(y_)
        return y_
  • 辅助分类器可视化
from torchsummary import summary
classifier = YQAuxClassifier(512, 10)   # 最后输出的深度是64 + 128 + 32 + 32
# 输出网络结构
print(summary(classifier, input_size=(512, 14, 14), device='cpu'))
----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
================================================================
 AdaptiveAvgPool2d-1            [-1, 512, 4, 4]               0
            Conv2d-2            [-1, 128, 4, 4]          65,536
       BatchNorm2d-3            [-1, 128, 4, 4]             256
              ReLU-4            [-1, 128, 4, 4]               0
          YQConv2d-5            [-1, 128, 4, 4]               0
            Linear-6                 [-1, 1024]       2,098,176
              ReLU-7                 [-1, 1024]               0
           Dropout-8                 [-1, 1024]               0
            Linear-9                   [-1, 10]          10,250
================================================================
Total params: 2,174,218
Trainable params: 2,174,218
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.38
Forward/backward pass size (MB): 0.15
Params size (MB): 8.29
Estimated Total Size (MB): 8.83
----------------------------------------------------------------
None

GoogLeNet实现

  • 按照下面几个阶段实现:
    1. 前面几个卷积
    2. Inception(包含两个辅助分类器)
    3. 分类器
import torch
from torch.nn import Module, AdaptiveAvgPool2d, Linear, Dropout, MaxPool2d
class YQGoogLeNet(Module):
    def __init__(self, num_classes=1000):
        super(YQGoogLeNet, self).__init__()
        # 定义层
        self.conv_1  = YQConv2d(in_channels=3,  out_channels=64,  kernel_size=7, stride=2, padding=3)
        self.pool_1  = MaxPool2d(3, stride=2, ceil_mode=True)
        
        self.conv_2  = YQConv2d(in_channels=64, out_channels=64,  kernel_size=1, stride=1, padding=0)
        
        self.conv_3  = YQConv2d(in_channels=64, out_channels=192, kernel_size=3, stride=1, padding=1)
        self.pool_2  = MaxPool2d(3, stride=2, ceil_mode=True)
        
        self.ince_3a = YQInception(192, 64, 96, 128, 16, 32, 32)
        self.ince_3b = YQInception(256, 128, 128, 192, 32, 96, 64)
        self.pool_3  = MaxPool2d(3, stride=2, ceil_mode=True)
        
        self.ince_4a = YQInception(480, 192, 96, 208, 16, 48, 64)
        
        self.ince_4b = YQInception(512, 160, 112, 224, 24, 64, 64)
        self.ince_4c = YQInception(512, 128, 128, 256, 24, 64, 64)
        self.ince_4d = YQInception(512, 112, 144, 288, 32, 64, 64)
        self.ince_4e = YQInception(528, 256, 160, 320, 32, 128, 128)
        self.pool_4  = MaxPool2d(2, stride=2, ceil_mode=True)
        
        self.ince_5a = YQInception(832, 256, 160, 320, 32, 128, 128)
        self.ince_5b = YQInception(832, 384, 192, 384, 48, 128, 128)
        
        # 两个辅助分类器
        self.auxi_1  = YQAuxClassifier(512, num_classes)
        self.auxi_2  = YQAuxClassifier(528, num_classes)
        
        # 结尾的分类层
        self.pool_5  = AdaptiveAvgPool2d((1, 1))
        self.drop = Dropout(0.2)
        self.full = Linear(1024, num_classes)
        
    def forward(self, x):
        # -----------------------------
        y_ = self.conv_1(x)
        y_ = self.pool_1(y_)
        # -----------------------------
        y_ = self.conv_2(y_)
        y_ = self.conv_3(y_)
        y_ = self.pool_2(y_)
        # -----------------------------
        y_ = self.ince_3a(y_)
        y_ = self.ince_3b(y_)
        y_ = self.pool_3(y_)
        # -----------------------------
        y_ = self.ince_4a(y_)
        # -----------------------------
        a1 = self.auxi_1(y_)
        # -----------------------------
        y_ = self.ince_4b(y_)
        y_ = self.ince_4c(y_)
        y_ = self.ince_4d(y_)
        # -----------------------------
        a2 = self.auxi_2(y_)
        # -----------------------------
        y_ = self.ince_4e(y_)
        y_ = self.pool_4(y_)
        # -----------------------------
        y_ = self.ince_5a(y_)
        y_ = self.ince_5b(y_)
        # -----------------------------
        y_ = self.pool_5(y_)
        y_ = torch.flatten(y_, 1)
        y_ = self.drop(y_)
        y_ = self.full(y_)
        
        return y_, a1, a2
  • GoogLeNet网络结构
from torchsummary import summary
net = YQGoogLeNet(1000) 
# 输出网络结构
print(summary(net,input_size=(3, 244, 244), device='cpu'))
----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
================================================================
            Conv2d-1         [-1, 64, 122, 122]           9,408
       BatchNorm2d-2         [-1, 64, 122, 122]             128
              ReLU-3         [-1, 64, 122, 122]               0
          YQConv2d-4         [-1, 64, 122, 122]               0
         MaxPool2d-5           [-1, 64, 61, 61]               0
            Conv2d-6           [-1, 64, 61, 61]           4,096
       BatchNorm2d-7           [-1, 64, 61, 61]             128
              ReLU-8           [-1, 64, 61, 61]               0
          YQConv2d-9           [-1, 64, 61, 61]               0
           Conv2d-10          [-1, 192, 61, 61]         110,592
      BatchNorm2d-11          [-1, 192, 61, 61]             384
             ReLU-12          [-1, 192, 61, 61]               0
         YQConv2d-13          [-1, 192, 61, 61]               0
        MaxPool2d-14          [-1, 192, 30, 30]               0
           Conv2d-15           [-1, 64, 30, 30]          12,288
      BatchNorm2d-16           [-1, 64, 30, 30]             128
             ReLU-17           [-1, 64, 30, 30]               0
         YQConv2d-18           [-1, 64, 30, 30]               0
           Conv2d-19           [-1, 96, 30, 30]          18,432
      BatchNorm2d-20           [-1, 96, 30, 30]             192
             ReLU-21           [-1, 96, 30, 30]               0
         YQConv2d-22           [-1, 96, 30, 30]               0
           Conv2d-23          [-1, 128, 30, 30]         110,592
      BatchNorm2d-24          [-1, 128, 30, 30]             256
             ReLU-25          [-1, 128, 30, 30]               0
         YQConv2d-26          [-1, 128, 30, 30]               0
           Conv2d-27           [-1, 16, 30, 30]           3,072
      BatchNorm2d-28           [-1, 16, 30, 30]              32
             ReLU-29           [-1, 16, 30, 30]               0
         YQConv2d-30           [-1, 16, 30, 30]               0
           Conv2d-31           [-1, 32, 30, 30]           4,608
      BatchNorm2d-32           [-1, 32, 30, 30]              64
             ReLU-33           [-1, 32, 30, 30]               0
         YQConv2d-34           [-1, 32, 30, 30]               0
        MaxPool2d-35          [-1, 192, 30, 30]               0
           Conv2d-36           [-1, 32, 30, 30]           6,144
      BatchNorm2d-37           [-1, 32, 30, 30]              64
             ReLU-38           [-1, 32, 30, 30]               0
         YQConv2d-39           [-1, 32, 30, 30]               0
      YQInception-40          [-1, 256, 30, 30]               0
           Conv2d-41          [-1, 128, 30, 30]          32,768
      BatchNorm2d-42          [-1, 128, 30, 30]             256
             ReLU-43          [-1, 128, 30, 30]               0
         YQConv2d-44          [-1, 128, 30, 30]               0
           Conv2d-45          [-1, 128, 30, 30]          32,768
      BatchNorm2d-46          [-1, 128, 30, 30]             256
             ReLU-47          [-1, 128, 30, 30]               0
         YQConv2d-48          [-1, 128, 30, 30]               0
           Conv2d-49          [-1, 192, 30, 30]         221,184
      BatchNorm2d-50          [-1, 192, 30, 30]             384
             ReLU-51          [-1, 192, 30, 30]               0
         YQConv2d-52          [-1, 192, 30, 30]               0
           Conv2d-53           [-1, 32, 30, 30]           8,192
      BatchNorm2d-54           [-1, 32, 30, 30]              64
             ReLU-55           [-1, 32, 30, 30]               0
         YQConv2d-56           [-1, 32, 30, 30]               0
           Conv2d-57           [-1, 96, 30, 30]          27,648
      BatchNorm2d-58           [-1, 96, 30, 30]             192
             ReLU-59           [-1, 96, 30, 30]               0
         YQConv2d-60           [-1, 96, 30, 30]               0
        MaxPool2d-61          [-1, 256, 30, 30]               0
           Conv2d-62           [-1, 64, 30, 30]          16,384
      BatchNorm2d-63           [-1, 64, 30, 30]             128
             ReLU-64           [-1, 64, 30, 30]               0
         YQConv2d-65           [-1, 64, 30, 30]               0
      YQInception-66          [-1, 480, 30, 30]               0
        MaxPool2d-67          [-1, 480, 15, 15]               0
           Conv2d-68          [-1, 192, 15, 15]          92,160
      BatchNorm2d-69          [-1, 192, 15, 15]             384
             ReLU-70          [-1, 192, 15, 15]               0
         YQConv2d-71          [-1, 192, 15, 15]               0
           Conv2d-72           [-1, 96, 15, 15]          46,080
      BatchNorm2d-73           [-1, 96, 15, 15]             192
             ReLU-74           [-1, 96, 15, 15]               0
         YQConv2d-75           [-1, 96, 15, 15]               0
           Conv2d-76          [-1, 208, 15, 15]         179,712
      BatchNorm2d-77          [-1, 208, 15, 15]             416
             ReLU-78          [-1, 208, 15, 15]               0
         YQConv2d-79          [-1, 208, 15, 15]               0
           Conv2d-80           [-1, 16, 15, 15]           7,680
      BatchNorm2d-81           [-1, 16, 15, 15]              32
             ReLU-82           [-1, 16, 15, 15]               0
         YQConv2d-83           [-1, 16, 15, 15]               0
           Conv2d-84           [-1, 48, 15, 15]           6,912
      BatchNorm2d-85           [-1, 48, 15, 15]              96
             ReLU-86           [-1, 48, 15, 15]               0
         YQConv2d-87           [-1, 48, 15, 15]               0
        MaxPool2d-88          [-1, 480, 15, 15]               0
           Conv2d-89           [-1, 64, 15, 15]          30,720
      BatchNorm2d-90           [-1, 64, 15, 15]             128
             ReLU-91           [-1, 64, 15, 15]               0
         YQConv2d-92           [-1, 64, 15, 15]               0
      YQInception-93          [-1, 512, 15, 15]               0
AdaptiveAvgPool2d-94            [-1, 512, 4, 4]               0
           Conv2d-95            [-1, 128, 4, 4]          65,536
      BatchNorm2d-96            [-1, 128, 4, 4]             256
             ReLU-97            [-1, 128, 4, 4]               0
         YQConv2d-98            [-1, 128, 4, 4]               0
           Linear-99                 [-1, 1024]       2,098,176
            ReLU-100                 [-1, 1024]               0
         Dropout-101                 [-1, 1024]               0
          Linear-102                 [-1, 1000]       1,025,000
 YQAuxClassifier-103                 [-1, 1000]               0
          Conv2d-104          [-1, 160, 15, 15]          81,920
     BatchNorm2d-105          [-1, 160, 15, 15]             320
            ReLU-106          [-1, 160, 15, 15]               0
        YQConv2d-107          [-1, 160, 15, 15]               0
          Conv2d-108          [-1, 112, 15, 15]          57,344
     BatchNorm2d-109          [-1, 112, 15, 15]             224
            ReLU-110          [-1, 112, 15, 15]               0
        YQConv2d-111          [-1, 112, 15, 15]               0
          Conv2d-112          [-1, 224, 15, 15]         225,792
     BatchNorm2d-113          [-1, 224, 15, 15]             448
            ReLU-114          [-1, 224, 15, 15]               0
        YQConv2d-115          [-1, 224, 15, 15]               0
          Conv2d-116           [-1, 24, 15, 15]          12,288
     BatchNorm2d-117           [-1, 24, 15, 15]              48
            ReLU-118           [-1, 24, 15, 15]               0
        YQConv2d-119           [-1, 24, 15, 15]               0
          Conv2d-120           [-1, 64, 15, 15]          13,824
     BatchNorm2d-121           [-1, 64, 15, 15]             128
            ReLU-122           [-1, 64, 15, 15]               0
        YQConv2d-123           [-1, 64, 15, 15]               0
       MaxPool2d-124          [-1, 512, 15, 15]               0
          Conv2d-125           [-1, 64, 15, 15]          32,768
     BatchNorm2d-126           [-1, 64, 15, 15]             128
            ReLU-127           [-1, 64, 15, 15]               0
        YQConv2d-128           [-1, 64, 15, 15]               0
     YQInception-129          [-1, 512, 15, 15]               0
          Conv2d-130          [-1, 128, 15, 15]          65,536
     BatchNorm2d-131          [-1, 128, 15, 15]             256
            ReLU-132          [-1, 128, 15, 15]               0
        YQConv2d-133          [-1, 128, 15, 15]               0
          Conv2d-134          [-1, 128, 15, 15]          65,536
     BatchNorm2d-135          [-1, 128, 15, 15]             256
            ReLU-136          [-1, 128, 15, 15]               0
        YQConv2d-137          [-1, 128, 15, 15]               0
          Conv2d-138          [-1, 256, 15, 15]         294,912
     BatchNorm2d-139          [-1, 256, 15, 15]             512
            ReLU-140          [-1, 256, 15, 15]               0
        YQConv2d-141          [-1, 256, 15, 15]               0
          Conv2d-142           [-1, 24, 15, 15]          12,288
     BatchNorm2d-143           [-1, 24, 15, 15]              48
            ReLU-144           [-1, 24, 15, 15]               0
        YQConv2d-145           [-1, 24, 15, 15]               0
          Conv2d-146           [-1, 64, 15, 15]          13,824
     BatchNorm2d-147           [-1, 64, 15, 15]             128
            ReLU-148           [-1, 64, 15, 15]               0
        YQConv2d-149           [-1, 64, 15, 15]               0
       MaxPool2d-150          [-1, 512, 15, 15]               0
          Conv2d-151           [-1, 64, 15, 15]          32,768
     BatchNorm2d-152           [-1, 64, 15, 15]             128
            ReLU-153           [-1, 64, 15, 15]               0
        YQConv2d-154           [-1, 64, 15, 15]               0
     YQInception-155          [-1, 512, 15, 15]               0
          Conv2d-156          [-1, 112, 15, 15]          57,344
     BatchNorm2d-157          [-1, 112, 15, 15]             224
            ReLU-158          [-1, 112, 15, 15]               0
        YQConv2d-159          [-1, 112, 15, 15]               0
          Conv2d-160          [-1, 144, 15, 15]          73,728
     BatchNorm2d-161          [-1, 144, 15, 15]             288
            ReLU-162          [-1, 144, 15, 15]               0
        YQConv2d-163          [-1, 144, 15, 15]               0
          Conv2d-164          [-1, 288, 15, 15]         373,248
     BatchNorm2d-165          [-1, 288, 15, 15]             576
            ReLU-166          [-1, 288, 15, 15]               0
        YQConv2d-167          [-1, 288, 15, 15]               0
          Conv2d-168           [-1, 32, 15, 15]          16,384
     BatchNorm2d-169           [-1, 32, 15, 15]              64
            ReLU-170           [-1, 32, 15, 15]               0
        YQConv2d-171           [-1, 32, 15, 15]               0
          Conv2d-172           [-1, 64, 15, 15]          18,432
     BatchNorm2d-173           [-1, 64, 15, 15]             128
            ReLU-174           [-1, 64, 15, 15]               0
        YQConv2d-175           [-1, 64, 15, 15]               0
       MaxPool2d-176          [-1, 512, 15, 15]               0
          Conv2d-177           [-1, 64, 15, 15]          32,768
     BatchNorm2d-178           [-1, 64, 15, 15]             128
            ReLU-179           [-1, 64, 15, 15]               0
        YQConv2d-180           [-1, 64, 15, 15]               0
     YQInception-181          [-1, 528, 15, 15]               0
AdaptiveAvgPool2d-182            [-1, 528, 4, 4]               0
          Conv2d-183            [-1, 128, 4, 4]          67,584
     BatchNorm2d-184            [-1, 128, 4, 4]             256
            ReLU-185            [-1, 128, 4, 4]               0
        YQConv2d-186            [-1, 128, 4, 4]               0
          Linear-187                 [-1, 1024]       2,098,176
            ReLU-188                 [-1, 1024]               0
         Dropout-189                 [-1, 1024]               0
          Linear-190                 [-1, 1000]       1,025,000
 YQAuxClassifier-191                 [-1, 1000]               0
          Conv2d-192          [-1, 256, 15, 15]         135,168
     BatchNorm2d-193          [-1, 256, 15, 15]             512
            ReLU-194          [-1, 256, 15, 15]               0
        YQConv2d-195          [-1, 256, 15, 15]               0
          Conv2d-196          [-1, 160, 15, 15]          84,480
     BatchNorm2d-197          [-1, 160, 15, 15]             320
            ReLU-198          [-1, 160, 15, 15]               0
        YQConv2d-199          [-1, 160, 15, 15]               0
          Conv2d-200          [-1, 320, 15, 15]         460,800
     BatchNorm2d-201          [-1, 320, 15, 15]             640
            ReLU-202          [-1, 320, 15, 15]               0
        YQConv2d-203          [-1, 320, 15, 15]               0
          Conv2d-204           [-1, 32, 15, 15]          16,896
     BatchNorm2d-205           [-1, 32, 15, 15]              64
            ReLU-206           [-1, 32, 15, 15]               0
        YQConv2d-207           [-1, 32, 15, 15]               0
          Conv2d-208          [-1, 128, 15, 15]          36,864
     BatchNorm2d-209          [-1, 128, 15, 15]             256
            ReLU-210          [-1, 128, 15, 15]               0
        YQConv2d-211          [-1, 128, 15, 15]               0
       MaxPool2d-212          [-1, 528, 15, 15]               0
          Conv2d-213          [-1, 128, 15, 15]          67,584
     BatchNorm2d-214          [-1, 128, 15, 15]             256
            ReLU-215          [-1, 128, 15, 15]               0
        YQConv2d-216          [-1, 128, 15, 15]               0
     YQInception-217          [-1, 832, 15, 15]               0
       MaxPool2d-218            [-1, 832, 8, 8]               0
          Conv2d-219            [-1, 256, 8, 8]         212,992
     BatchNorm2d-220            [-1, 256, 8, 8]             512
            ReLU-221            [-1, 256, 8, 8]               0
        YQConv2d-222            [-1, 256, 8, 8]               0
          Conv2d-223            [-1, 160, 8, 8]         133,120
     BatchNorm2d-224            [-1, 160, 8, 8]             320
            ReLU-225            [-1, 160, 8, 8]               0
        YQConv2d-226            [-1, 160, 8, 8]               0
          Conv2d-227            [-1, 320, 8, 8]         460,800
     BatchNorm2d-228            [-1, 320, 8, 8]             640
            ReLU-229            [-1, 320, 8, 8]               0
        YQConv2d-230            [-1, 320, 8, 8]               0
          Conv2d-231             [-1, 32, 8, 8]          26,624
     BatchNorm2d-232             [-1, 32, 8, 8]              64
            ReLU-233             [-1, 32, 8, 8]               0
        YQConv2d-234             [-1, 32, 8, 8]               0
          Conv2d-235            [-1, 128, 8, 8]          36,864
     BatchNorm2d-236            [-1, 128, 8, 8]             256
            ReLU-237            [-1, 128, 8, 8]               0
        YQConv2d-238            [-1, 128, 8, 8]               0
       MaxPool2d-239            [-1, 832, 8, 8]               0
          Conv2d-240            [-1, 128, 8, 8]         106,496
     BatchNorm2d-241            [-1, 128, 8, 8]             256
            ReLU-242            [-1, 128, 8, 8]               0
        YQConv2d-243            [-1, 128, 8, 8]               0
     YQInception-244            [-1, 832, 8, 8]               0
          Conv2d-245            [-1, 384, 8, 8]         319,488
     BatchNorm2d-246            [-1, 384, 8, 8]             768
            ReLU-247            [-1, 384, 8, 8]               0
        YQConv2d-248            [-1, 384, 8, 8]               0
          Conv2d-249            [-1, 192, 8, 8]         159,744
     BatchNorm2d-250            [-1, 192, 8, 8]             384
            ReLU-251            [-1, 192, 8, 8]               0
        YQConv2d-252            [-1, 192, 8, 8]               0
          Conv2d-253            [-1, 384, 8, 8]         663,552
     BatchNorm2d-254            [-1, 384, 8, 8]             768
            ReLU-255            [-1, 384, 8, 8]               0
        YQConv2d-256            [-1, 384, 8, 8]               0
          Conv2d-257             [-1, 48, 8, 8]          39,936
     BatchNorm2d-258             [-1, 48, 8, 8]              96
            ReLU-259             [-1, 48, 8, 8]               0
        YQConv2d-260             [-1, 48, 8, 8]               0
          Conv2d-261            [-1, 128, 8, 8]          55,296
     BatchNorm2d-262            [-1, 128, 8, 8]             256
            ReLU-263            [-1, 128, 8, 8]               0
        YQConv2d-264            [-1, 128, 8, 8]               0
       MaxPool2d-265            [-1, 832, 8, 8]               0
          Conv2d-266            [-1, 128, 8, 8]         106,496
     BatchNorm2d-267            [-1, 128, 8, 8]             256
            ReLU-268            [-1, 128, 8, 8]               0
        YQConv2d-269            [-1, 128, 8, 8]               0
     YQInception-270           [-1, 1024, 8, 8]               0
AdaptiveAvgPool2d-271           [-1, 1024, 1, 1]               0
         Dropout-272                 [-1, 1024]               0
          Linear-273                 [-1, 1000]       1,025,000
================================================================
Total params: 13,004,888
Trainable params: 13,004,888
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.68
Forward/backward pass size (MB): 139.36
Params size (MB): 49.61
Estimated Total Size (MB): 189.65
----------------------------------------------------------------
None
  • 我们使用1000个类别,输出的参数大小与官方的大小一致。

  • 上面模型有个不足的地方就是:

    • 辅助分类器只对训练有用,实际上对预测分类的时候是没有用处的,可以使用一个逻辑变量打开/关闭辅助分类器的计算。

数据集加载与训练

  • 由于GoogLeNet有两个辅助分类器,鼓励在浅层分类,所以训练的方式与原来的单纯的分类训练方式有差别。

    • 在训练过程中,辅助分类器的损失会根据折扣后的权重(折扣权重为0.3)叠加到总损失中。
  • 为了代码清晰,下面代码都几种在一起。

import torch
from torch.nn import Conv2d, BatchNorm2d, Module, ReLU, Sequential, MaxPool2d, AdaptiveAvgPool2d, Linear, Dropout
from torchvision.datasets import ImageFolder
from torchvision.transforms import *
from torchvision.transforms.functional import *
from torch.utils.data import random_split
from torch.utils.data import DataLoader
import torch
import torchvision
import numpy as np
import cv2

# 卷积与BN,ReLU的合并封装,实际上使用Sequential也可以更加简洁的实现。
# --------------------------------------------------------------
class YQConv2d(Module):
    
    # 构造器,初始化Conv2d, BatchNorm2d与ReLU
    def __init__(self, in_channels, out_channels, kernel_size=1, stride=1, padding=0):
        super(YQConv2d, self).__init__()
        # 卷积层
        self.conv = Conv2d(in_channels, out_channels, kernel_size, stride, padding, bias=False)
        # BatchNorm2d
        self.bn = BatchNorm2d(out_channels, eps=0.001)
        # 激活
        self.relu = ReLU(inplace=True)

    def forward(self, x):
        y_ = self.conv(x)
        y_ = self.bn(y_)
        y_ = self.relu(y_)
        return y_

    
# Inception V2实现
# --------------------------------------------------------------
class YQInception(Module):
    # 构造器设置4个分支的参数(输出的通道数)
    def __init__(self, in_channels, ch1x1, ch3x3_1, ch3x3_2, ch5x5_1, ch5x5_2, ch_pool):
        super(YQInception, self).__init__()
        # YQConv2d(self, in_channels, out_channels, kernel_size=1, stride=1, padding=0):
        # 分支-1
        self.branch_1 = Sequential(
            YQConv2d(in_channels=in_channels, out_channels=ch1x1, kernel_size=1, stride=1, padding=0)
        )
        # 分支-2
        self.branch_2 = Sequential(
            YQConv2d(in_channels=in_channels, out_channels=ch3x3_1, kernel_size=1, stride=1, padding=0),
            YQConv2d(in_channels=ch3x3_1,     out_channels=ch3x3_2, kernel_size=3, stride=1, padding=1)
        )
        # 分支-3
        self.branch_3 = Sequential(
            YQConv2d(in_channels=in_channels, out_channels=ch5x5_1, kernel_size=1, stride=1, padding=0),
            YQConv2d(in_channels=ch5x5_1,     out_channels=ch5x5_2, kernel_size=3, stride=1, padding=1)
        )
        # 分支-4
        self.branch_4 = Sequential(
            MaxPool2d(kernel_size=3, stride=1, padding=1, ceil_mode=True),
            YQConv2d(in_channels=in_channels, out_channels=ch_pool, kernel_size=1, stride=1, padding=0)
        )
    
    def forward(self, x):
        b_y1 = self.branch_1(x)
        b_y2 = self.branch_2(x)
        b_y3 = self.branch_3(x)
        b_y4 = self.branch_4(x)
        
        y_ = torch.cat([b_y1, b_y2, b_y3, b_y4],  1)   # 1 表示按照列链接,就是行堆叠在一起。
        return y_
    

# 辅助分类器实现
# --------------------------------------------------------------
class YQAuxClassifier(Module):
    def __init__(self, in_channels, num_classes):
        super(YQAuxClassifier, self).__init__()
        # 池化/卷积/全连接/全连接
        self.pool = AdaptiveAvgPool2d((4, 4))
        self.conv = YQConv2d(in_channels=in_channels, out_channels=128, kernel_size=1, stride=1, padding=0)
        self.fc_1 = Linear(2048, 1024)
        self.relu = ReLU(inplace=True)
        self.drop = Dropout(p=0.7, inplace=False)
        self.fc_2 = Linear(1024, num_classes)
    
    def forward(self, x):
        y_ = self.pool(x)
        y_ = self.conv(y_)
        y_ = torch.flatten(y_, 1)    # 卷积到连接层的数据维度转换。
        y_ = self.fc_1(y_)
        y_ = self.relu(y_)
        y_ = self.drop(y_)
        y_ = self.fc_2(y_)
        return y_



# GoogLeNet网络实现
# --------------------------------------------------------------
class YQGoogLeNet(Module):
    def __init__(self, num_classes=1000):
        super(YQGoogLeNet, self).__init__()
        # 定义层
        self.conv_1  = YQConv2d(in_channels=3,  out_channels=64,  kernel_size=7, stride=2, padding=3)
        self.pool_1  = MaxPool2d(3, stride=2, ceil_mode=True)
        
        self.conv_2  = YQConv2d(in_channels=64, out_channels=64,  kernel_size=1, stride=1, padding=0)
        
        self.conv_3  = YQConv2d(in_channels=64, out_channels=192, kernel_size=3, stride=1, padding=1)
        self.pool_2  = MaxPool2d(3, stride=2, ceil_mode=True)
        
        self.ince_3a = YQInception(192, 64, 96, 128, 16, 32, 32)
        self.ince_3b = YQInception(256, 128, 128, 192, 32, 96, 64)
        self.pool_3  = MaxPool2d(3, stride=2, ceil_mode=True)
        
        self.ince_4a = YQInception(480, 192, 96, 208, 16, 48, 64)
        
        self.ince_4b = YQInception(512, 160, 112, 224, 24, 64, 64)
        self.ince_4c = YQInception(512, 128, 128, 256, 24, 64, 64)
        self.ince_4d = YQInception(512, 112, 144, 288, 32, 64, 64)
        self.ince_4e = YQInception(528, 256, 160, 320, 32, 128, 128)
        self.pool_4  = MaxPool2d(2, stride=2, ceil_mode=True)
        
        self.ince_5a = YQInception(832, 256, 160, 320, 32, 128, 128)
        self.ince_5b = YQInception(832, 384, 192, 384, 48, 128, 128)
        
        # 两个辅助分类器
        self.auxi_1  = YQAuxClassifier(512, num_classes)
        self.auxi_2  = YQAuxClassifier(528, num_classes)
        
        # 结尾的分类层
        self.pool_5  = AdaptiveAvgPool2d((1, 1))
        self.drop = Dropout(0.2)
        self.full = Linear(1024, num_classes)
        
    def forward(self, x):
        # -----------------------------
        y_ = self.conv_1(x)
        y_ = self.pool_1(y_)
        # -----------------------------
        y_ = self.conv_2(y_)
        y_ = self.conv_3(y_)
        y_ = self.pool_2(y_)
        # -----------------------------
        y_ = self.ince_3a(y_)
        y_ = self.ince_3b(y_)
        y_ = self.pool_3(y_)
        # -----------------------------
        y_ = self.ince_4a(y_)
        # -----------------------------
        a1 = self.auxi_1(y_)
        # -----------------------------
        y_ = self.ince_4b(y_)
        y_ = self.ince_4c(y_)
        y_ = self.ince_4d(y_)
        # -----------------------------
        a2 = self.auxi_2(y_)
        # -----------------------------
        y_ = self.ince_4e(y_)
        y_ = self.pool_4(y_)
        # -----------------------------
        y_ = self.ince_5a(y_)
        y_ = self.ince_5b(y_)
        # -----------------------------
        y_ = self.pool_5(y_)
        y_ = torch.flatten(y_, 1)
        y_ = self.drop(y_)
        y_ = self.full(y_)
        
        return y_, a1, a2


# 数据加载
# --------------------------------------------------------------
def load_data(img_dir, rate=0.8):
    transform = Compose(
        [
            Resize((224, 224)),          #RandomResizedCrop(224),
    #         RandomHorizontalFlip(),
            ToTensor(),
            Normalize(mean=[0.0, 0.0, 0.0], std=[1.0, 1.0, 1.0]),   # 均值与方差,(这个运算输入必须是Tensor图像,所以需要在ToTensor后)
        ]
    )
    ds = ImageFolder(img_dir, transform=transform)

    l = len(ds)
    l_train = int(l * rate)
    train, test = random_split(ds, [l_train, l - l_train])
    
    train_loader = torch.utils.data.DataLoader(dataset=train, shuffle=True, batch_size=50)   # 100,因为每个类的图像是1300个
    test_loader = torch.utils.data.DataLoader(dataset=test, shuffle=True, batch_size=50)  # 一个批次直接预测

    return train_loader, test_loader


# 训练实现
# ==============================================================
# 1. 加载数据集
print("1. 加载数据集")
train_loader, test_loader = load_data("./imagenet2012", 0.8)

CUDA = torch.cuda.is_available()
# 2. 网络搭建
print("2. 网络搭建")
net=YQGoogLeNet(4)
if CUDA:
    net.cuda()

# 3. 训练
print("3. 训练")
optimizer=torch.optim.Adam(net.parameters(),lr=0.001)
loss_F=torch.nn.CrossEntropyLoss()

epoch = 10


for n in range(epoch): # 数据集只迭代一次
    for step, input_data in enumerate(train_loader):
        x_, y_=input_data
        if CUDA:
            # GPU运算 -----------------------------------------------
            x_ = x_.cuda()
            y_ = y_.cuda()
        pred, pred_aux_1, pred_aux_2 =net(x_.view(-1, 3, 224, 224))  
        loss = loss_F(pred, y_) # 计算loss
        loss_aux_1 = loss_F(pred_aux_1, y_)
        loss_aux_2 = loss_F(pred_aux_2, y_) 
        
        last_loss = loss + 0.3 * (loss_aux_1 + loss_aux_2)   # 辅助分类器的误差按照30%的比例使用
        optimizer.zero_grad()
        last_loss.backward()
        optimizer.step()
        
        with torch.no_grad():
            all_num = 0.0 
            acc = 0.0
            for t_x, t_y in  test_loader:
                all_num  += len(t_y)
                if CUDA:
                    t_x = t_x.cuda()
                    t_y = t_y.cuda()

                # 
                test_pred, _, _=net(t_x.view(-1, 3, 224, 224))    # 辅助分类器的返回结果不需要
                prob=torch.nn.functional.softmax(test_pred, dim=1)
                pred_cls=torch.argmax(prob, dim=1)
                acc += (pred_cls == t_y).float().sum()
            print(f"轮数/批次:{n:02d}/{step:02d}: \t识别正确率:{acc/all_num *100:6.4f}, 损失值:{last_loss:6.4f}")
        # print(f"轮数:{n+1:02d}: \t识别正确率:{acc/all_num *100:6.4f}, \t损失值:{loss:6.4f}")

# 保存模型
torch.save(net.state_dict(), "./googlenet.models")  # GPU保存

1. 加载数据集
2. 网络搭建
3. 训练
轮数/批次:00/00:    识别正确率:34.5566, 损失值:2.2265
轮数/批次:00/01:    识别正确率:33.2314, 损失值:2.1856
轮数/批次:00/02:    识别正确率:44.8522, 损失值:3.0660
轮数/批次:00/03:    识别正确率:40.4689, 损失值:2.3589
......
轮数/批次:09/75:    识别正确率:80.8359, 损失值:0.8001
轮数/批次:09/76:    识别正确率:79.6126, 损失值:0.3436
轮数/批次:09/77:    识别正确率:81.0397, 损失值:0.7516
轮数/批次:09/78:    识别正确率:81.2436, 损失值:0.3100
GoogLeNet等几个网络的测试分析

Inception V2

  • 输入图像为:
    • 3 \times 299 \times 299

卷积分解

  • 卷积分解设计

  • Inception V2使用小核卷积替代大核卷积,并减少计算量

    • 5 \times 5卷积使用2个3 \times 3的卷积替代。
    • V2早期与V1的区别是就是增加了BatchNorm2d,官方的说法是V1增强,后来发布了卷积分解,才正式称呼为V2
  • 卷积分解示意图

卷积分解示意图

深度分解

  • 深度分解设计

  • 深度分解的原因是传统的并联池化处理存在问题:(输出的通道数保持一个相等值。)

    1. 先卷积,再池化
      • 增加卷积的计算量。
    2. 先池化,再卷积
      • 会产生特征缺失的情况。
卷积与池化的两种串联方式
  • 深度分解的设计如下:
    • 可以保持特征,同时降低计算量。
卷积与池化串联的深度分解设计

Inception V2的GoogLeNet网络结构

  • 下图实际上是也可以成为去掉BatchNorm2d的V3版本。实际上也是V3(包含不对成的卷积分解)
Inception V2 GoogLeNet网络结构
  • 注:
    • 上表中的Figure 5指没有进化的Inception
    • 上表中的Figure 6是指小卷积版的Inception(用3x3卷积核代替5x5卷积核),
    • 上表中的Figure 7是指不对称版的Inception(用1xn、nx1卷积核代替nxn卷积核)。

附录

  1. Figure-5
Figure-5
  1. Figure-6
Figure-6
  1. Figure-7
Figure-7

Inception V3

  • 不对称卷积分解设计

  • 引入了不对称卷积分解:n \times n的卷积,并分解成 n \times 11 \times nn=3, 5,具体的网络结构见V2中的网络结构。

  • 图像的输入提升为3 \times 299 \times 299

卷积不对称分解

Inception V4

Inception残差设计

  • 使用Inception做残差,改进Inception-v3,得到如下残差方式
    1. Inception-ResNet-v1,
    2. Inception-ResNet-v2,
    3. Inception-v4
  • Inception残差设计
Inception残差

Inception V4 GoogLeNet网络结构

Inception V4 GoogLeNet网络结构

附录

  1. 关于Google发表的关于GoogLetNet的系列文章,在这些论文中对Inception v1、Inception v2、Inception v3、Inception v4 等思想和技术原理进行了详细的介绍:

    • 《Going deeper with convolutions》
    • 《Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift》
    • 《Rethinking the Inception Architecture for Computer Vision》
    • 《Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning》
  2. 官方的源代码还做了很多细节的处理,比如:

    • 结构更加良好的代码组织;
    • 权重提供初始化;
    • 对不同结构的数据提供transform转换
  3. V2, V3,V4的代码花点时间也可以撸出来。

    • 其中V4可以参考我们实现的ResNet网络。

©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 212,222评论 6 493
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 90,455评论 3 385
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 157,720评论 0 348
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 56,568评论 1 284
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 65,696评论 6 386
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 49,879评论 1 290
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 39,028评论 3 409
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 37,773评论 0 268
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 44,220评论 1 303
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 36,550评论 2 327
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 38,697评论 1 341
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 34,360评论 4 332
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 40,002评论 3 315
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 30,782评论 0 21
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 32,010评论 1 266
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 46,433评论 2 360
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 43,587评论 2 350