针对不同的物体大小(Object Scales),传统的方法将图像转化成不同的大小,分别处理然后把结果综合。
这里ssd从不同的卷积层利用featuremap,可以达到同样的效果
生成预测的方法
如下图所示:
最左侧是选取的神经网络中的一个“图像”层
每个层做3个处理:
(1)生成loc预测,厚度4 x box
(2)生成类别预测,厚度21(类别) x box
(3)生成priorbox,这里面有个box大小范围、宽长比(2 3)等等
prior_box_param {
min_size: 276.0
max_size: 330.0
aspect_ratio: 2
aspect_ratio: 3
flip: true
clip: true
variance: 0.1
variance: 0.1
variance: 0.2
variance: 0.2
}
ssd实例说明
(1)基本网络
Layer name | "图像"规格 |
---|---|
input | 3x300x300 |
conv1_1 | 64x300x300 |
conv1_2 | 64x300x300 |
pool_1 | 64x150x150 |
conv2_1 | 128x150x150 |
conv2_2 | 128x150x150 |
pool_2 | 128x75x75 |
conv3_1 | 256x75x75 |
conv3_2 | 256x75x75 |
conv3_3 | 256x75x75 |
pool_3 | 256x38x38 |
conv4_1 | 512x38x38 |
conv4_2 | 512x38x38 |
conv4_3 | 512x38x38 |
pool_4 | 5121919 |
conv5_1 | 512x19x19 |
conv5_2 | 512x19x19 |
conv5_3 | 512x19x19 |
----------- | VGG昏割线 |
fc6(convolution kernel dilation) | 1024x19x19 |
fc7 | 1024x19x19 |
conv6_1 | 256x19x19 |
conv6_2 | 512x10x10 |
conv7_1 | 128x10x10 |
conv7_2(10-3+1*2)/2+1 | 256x5x5 |
conv8_1 | 128x5x5 |
conv8_2 | 256x3x3 |
pool6 | 25611 |
选取提取特征的层
Layer name | "图像"规格 | 特征生成 | 特征说明 |
---|---|---|---|
conv4_3 | 512x38x38 | mbox-loc conv | 38x38x12(=3x4) |
mbox-conf conv | 38x38x63(=3x21) | ||
prior-box | box min:30 | ||
fc7 | 1024x19x19 | mbox-loc conv | 19x19x24(=6x4) |
mbox-conf conv | 19x19x126(=6x21) | ||
prior-box | box min:60 max:114 | ||
conv6_2 | 512x10x10 | mbox-loc conv | 10x10x24(=6x4) |
mbox-conf conv | 10x10x126(=6x21) | ||
prior-box | box min:114 max:168 | ||
conv7_2 | 256x5x5 | mbox-loc conv | 5x5x24(=6x4) |
mbox-conf conv | 5x5x126(=6x21) | ||
prior-box | box min:168 max:222 | ||
conv8_2 | 256x3x3 | mbox-loc conv | 3x3x24(=6x4) |
mbox-conf conv | 3x3x126(=6x21) | ||
prior-box | box min:222 max:276 | ||
pool_6 | 256x1x1 | mbox-loc conv | 1x1x24(=6x4) |
mbox-conf conv | 1x1x126(=6x21) | ||
prior-box | box min:276 max:330 |
所以对一张图一共提供:
38x38x3+(19x19+10x10+5x5+3x3+1x1)x6=7308个detection
每个detection包括4个值表示位置和21个值表示每个类的概率
为了实现ssd,原生的caffe是不行的
要定义新层:
Normalize
Permute
MultiBoxLoss等
一篇定义新层的方法如下所示:
http://blog.csdn.net/kuaitoukid/article/details/41865803
设计feature map##
已知一个神经网络,选特定层,再后面加:
layer {
name: "conv6_2_mbox_loc"
type: "Convolution"
bottom: "conv6_2"
top: "conv6_2_mbox_loc"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 24/////////////////////////////////////////////////////////////////////n*4
pad: 1/////////////////////////////////////////////////////////////////////这样feature size由所选层长宽决定
kernel_size: 3
stride: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0
}
}
}
layer {
name: "conv6_2_mbox_loc_perm"
type: "Permute"
bottom: "conv6_2_mbox_loc"
top: "conv6_2_mbox_loc_perm"
permute_param {
order: 0
order: 2
order: 3
order: 1
}
}
layer {
name: "conv6_2_mbox_loc_flat"
type: "Flatten"
bottom: "conv6_2_mbox_loc_perm"
top: "conv6_2_mbox_loc_flat"
flatten_param {
axis: 1
}
}
layer {
name: "conv6_2_mbox_conf"
type: "Convolution"
bottom: "conv6_2"
top: "conv6_2_mbox_conf"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 126/////////////////////////////////////////////////////////////////////n*种类
pad: 1/////////////////////////////////////////////////////////////////////这样feature size由所选层长宽决定
kernel_size: 3
stride: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0
}
}
}
layer {
name: "conv6_2_mbox_conf_perm"
type: "Permute"
bottom: "conv6_2_mbox_conf"
top: "conv6_2_mbox_conf_perm"
permute_param {
order: 0
order: 2
order: 3
order: 1
}
}
layer {
name: "conv6_2_mbox_conf_flat"
type: "Flatten"
bottom: "conv6_2_mbox_conf_perm"
top: "conv6_2_mbox_conf_flat"
flatten_param {
axis: 1
}
}
layer {
name: "conv6_2_mbox_priorbox"
type: "PriorBox"
bottom: "conv6_2"
bottom: "data"
top: "conv6_2_mbox_priorbox"
prior_box_param {
min_size: 114.0 /////////////////////////////////////////////////////////////////////适配图像
max_size: 168.0
aspect_ratio: 2
aspect_ratio: 3
flip: true
clip: true
variance: 0.1
variance: 0.1
variance: 0.2
variance: 0.2
}
}