- 16个batch,3个颜色通道,高224,宽224,则张量维度为[16,3,224,224],经过32个3×3的卷积核的计算,张量维度变为[16,32,222,222],在计算过程中,颜色通道消失了,是如何处理的,如果是黑白图片,维度为[16,1,224,224],经过上述计算同样得到[16,32,222,222]。计算后的维度是[16,3,32,222,222]是否更合理?
计算原理(参考:https://www.cnblogs.com/huangtao36/p/7818072.html)
image
动态过程:
image
滤波器(过滤器:filter)的权值是根据你要检测的特征来决定的,在深度学习中,也即是要经过训练得到。检测的特征不同,权值就不一样。
如上单层的图像一般表示的是灰白图,既是没有颜色的,有颜色的RGB图像,会包含三个相同大小的图层,这时对应的滤波器也要变成三层,滤波器的层数(通道数)必须时刻与图像层数保持一致。。。
image
注意输出的是一个单层图
image
因为当一个三通道的过滤器与图像进行卷积时,是直接对27个数去加权计算它们的和得到一个结果,而不是分层计算。
问题:假设16个[3,3]卷积核进行计算,实际卷积核是[3,3,3],计算得到的结果是[1]。pytorch中的conv2d卷积核实际上是3D的,conv3d卷积核是4D的。
import torch
from torch import nn
from torch.nn import functional as F
class Lenet5(nn.Module):
def __init__(self):
super(Lenet5,self).__init__()
self.conv_unit = nn.Sequential(
nn.Conv2d(3, 6, kernel_size=5, stride=1, padding=0,bias=False),
# x: [b, 3, 224, 224] --> [b, 6,220,220 ]
nn.AvgPool2d(kernel_size=2, stride=2, padding=0),
# [b, 6,220,220 ] => [b, 6, 110, 110]
nn.Conv2d(6, 16, kernel_size=5, stride=1, padding=0),
# [b, 6, 110, 110] => [b,16,106, 106]
nn.AvgPool2d(kernel_size=2, stride=2, padding=0),
# [b,16,106, 106] => [b, 16, 53, 53]
)
# flatten
# fc unit
self.fc_unit = nn.Sequential(
nn.Linear(16*53*53,1200),
nn.ReLU(),
nn.Linear(1200,84),
nn.Linear(84,7), )
# use Cross Entropy Loss
#self.criteon = nn.MSELoss()
#self.criteon = nn.CrossEntropyLoss()
def forward(self, x):
batch_size = x.size(0) # x.shape[0]
# [b, 3, 32, 32] => [b, 16, 5, 5]
x = self.conv_unit(x)
# [b, 16, 5, 5] => [b, 16*5*5]
x = x.view(batch_size, -1) # x = x.view(batch_size, 16*5*5)
# [b, 16*5*5] => [b, 10]
logits = self.fc_unit(x)
#loss = self.criteon(logits, y)
return logits
def initialize_weights(self):
for m in self.modules():
if isinstance(m, nn.Conv2d): nn.init.xavier_normal_(m.weight.data)
if m.bias is not None:
m.bias.data.zero_()
elif isinstance(m, nn.BatchNorm2d):
m.weight.data.fill_(1)
m.bias.data.zero_()
elif isinstance(m, nn.Linear):
nn.init.normal_(m.weight.data, 0, 0.1)
m.bias.data.zero_()
def main():
net = Lenet5()
# [b, 3, 32, 32]
tmp = torch.rand(2, 3, 224, 224)
out = net(tmp)
# [b,16,5,5]
print('lenet out:', out.shape)
if __name__ == '__main__':
main()
import torch
import torch.nn as nn
from Resnetis import ResNet18
from lenet5is import Lenet5
from torchsummary import summary
net = Lenet5()
net.initialize_weights()
s = net.parameters()
for i in s:
print(i.shape) # torch.Size([6, 3, 5, 5])
break
summary(net, input_size=(3,224,224),device='cpu')
torch.Size([6, 3, 5, 5])
----------------------------------------------------------------
Layer (type) Output Shape Param #
================================================================
Conv2d-1 [-1, 6, 220, 220] 450
AvgPool2d-2 [-1, 6, 110, 110] 0
Conv2d-3 [-1, 16, 106, 106] 2,416
AvgPool2d-4 [-1, 16, 53, 53] 0
Linear-5 [-1, 1200] 53,934,000
ReLU-6 [-1, 1200] 0
Linear-7 [-1, 84] 100,884
Linear-8 [-1, 7] 595
================================================================
Total params: 54,038,345
Trainable params: 54,038,345
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.57
Forward/backward pass size (MB): 4.50
Params size (MB): 206.14
Estimated Total Size (MB): 211.22
----------------------------------------------------------------