yolo_net.py:
输入图像尺寸为448,cell_size 为7
self.boundary1 = self.cell_size * self.cell_size * self.num_classes # 7 x 7 x 20
self.boundary2 = self.boundary1 + self.cell_size * self.cell_size * self.boxes_per_cell
# 7 x 7 x 20 + 7 x 7 x 2
self.output_size = (self.cell_size * self.cell_size) * (self.num_classes + self.boxes_per_cell * 5)
网络输出predict 为 fc_32层,Labels shape = [batch_size, 7, 7, 25]
类别预测: #[batch_size, 7, 7, 20]
predict_classes = [self.batch_size, self.cell_size, self.cell_size, self.num_class])
定位预测: #[batch_size, 7, 7, 2]
predict_scales = [self.batch_size, self.cell_size, self.cell_size, self.boxes_per_cell]
box大小预测: #[batch_size, 7, 7, 2, 4]
predict_boxes = [self.batch_size, self.cell_size, self.cell_size, self.boxes_per_cell, 4]
Label:
类别结果:
response = [self.batch_size, self.cell_size, self.cell_size, 1] # [batch_size, 7, 7, 1]
定位结果:
boxes= [self.batch_size, self.cell_size, self.cell_size, 1, 4]) # [batch_size, 7, 7, 1, 4]
box大小结果:
boxes=tf.tile(boxes, [1, 1, 1, self.boxes_per_cell, 1]) / self.image_size # [batch_size, 7, 7, 2, 4]