Reading Note: Tiny SSD: A Tiny Single-shot Detection Deep Convolutional Neural Network for Real-time Embedded Object Detection

TITLE: Tiny SSD: A Tiny Single-shot Detection Deep Convolutional Neural Network for Real-time Embedded Object Detection

AUTHOR: Alexander Wong, Mohammad Javad Shafiee, Francis Li, Brendan Chwyl

ASSOCIATION: University of Waterloo, DarwinAI

FROM: arXiv:1802.06488

CONTRIBUTION

A single-shot detection deep convolutional neural network, Tiny SSD, is designed specifically for real-time embedded object detection.
A non-uniform Fire module is proposed based on SqueezeNet.
The network achieves 61.3% mAP in VOC2007 dataset with a model size of 2.3MB.

METHOD

DESIGN STRATEGIES

Tiny SSD network for real-time embedded object detection is composed of two main sub-network stacks:

A non-uniform Fire sub-network stack.
A non-uniform sub-network stack of highly optimized SSD-based auxiliary convolutional feature layers.

The first sub-network stack is feed into the second sub-network stack. Both sub-networks needs carefully design to run on an embedded device. The first sub-network works as the backbone, which directly affect the performance of object detection. The second sub-network should balance the performance and model size as well as inference speed.

Three key design strategies are:

Reduce the number of $3 \times 3$ filters as much as possible.
Reduce the number of input channels to $3 \times 3$ filters where possible.
Perform downsampling at a later stage in the network.

NETWORK STRUCTURE

Fire

Auxiliary Layers

PERFORMANCE

Performance

{: .center-image .image-width-480}

SOME THOUGHTS

The paper uses half precision floating-point to store the model, which reduce the model size by half. From my own expirence, several methods can be tried to export a deep learning model to embedded devices, including

Architecture design, just like this work illustrated.
Model pruning, such as decomposition, filter pruning and connection pruning.
BLAS library optimization.
Algorithm optimization. Using SSD as an example, the Prior-Box layer needs only one forward as long as the input image size does not change.