Reading Note: Tiny SSD: A Tiny Single-shot Detection Deep Convolutional Neural Network for Real-time Embedded Object Detection

TITLE: Tiny SSD: A Tiny Single-shot Detection Deep Convolutional Neural Network for Real-time Embedded Object Detection

AUTHOR: Alexander Wong, Mohammad Javad Shafiee, Francis Li, Brendan Chwyl

ASSOCIATION: University of Waterloo, DarwinAI

FROM: arXiv:1802.06488

CONTRIBUTION

  1. A single-shot detection deep convolutional neural network, Tiny SSD, is designed specifically for real-time embedded object detection.
  2. A non-uniform Fire module is proposed based on SqueezeNet.
  3. The network achieves 61.3% mAP in VOC2007 dataset with a model size of 2.3MB.

METHOD

DESIGN STRATEGIES

Tiny SSD network for real-time embedded object detection is composed of two main sub-network stacks:

  1. A non-uniform Fire sub-network stack.
  2. A non-uniform sub-network stack of highly optimized SSD-based auxiliary convolutional feature layers.

The first sub-network stack is feed into the second sub-network stack. Both sub-networks needs carefully design to run on an embedded device. The first sub-network works as the backbone, which directly affect the performance of object detection. The second sub-network should balance the performance and model size as well as inference speed.

Three key design strategies are:

  1. Reduce the number of $3 \times 3$ filters as much as possible.
  2. Reduce the number of input channels to $3 \times 3$ filters where possible.
  3. Perform downsampling at a later stage in the network.

NETWORK STRUCTURE

Fire
Auxiliary Layers

PERFORMANCE

Performance

{: .center-image .image-width-480}

SOME THOUGHTS

The paper uses half precision floating-point to store the model, which reduce the model size by half. From my own expirence, several methods can be tried to export a deep learning model to embedded devices, including

  1. Architecture design, just like this work illustrated.
  2. Model pruning, such as decomposition, filter pruning and connection pruning.
  3. BLAS library optimization.
  4. Algorithm optimization. Using SSD as an example, the Prior-Box layer needs only one forward as long as the input image size does not change.
©著作权归作者所有,转载或内容合作请联系作者
平台声明:文章内容(如有图片或视频亦包括在内)由作者上传并发布,文章内容仅代表作者本人观点,简书系信息发布平台,仅提供信息存储服务。

推荐阅读更多精彩内容