三值神经网络

权重压缩：三值神经网络

背景

卷积神经网络（Convolutional Neural Network，CNN）尤其适合于目标识别、分类、检测及图像分割等计算机视觉应用。典型的模型有数百万参数并运算量大；例如，AlexNet有6100万参数（浮点数权值共占用249MB存储空间），分类一张图片需要15亿高精度运算。所以为降低CNN规模和使用资源，现在有模型修剪（model pruning，去掉值较小的权值）和权值压缩（weight compression，利用少数几位量化权值）两种方法。
~~权值压缩：二值神经网络~~

三值神经网络

重要点：网络中的权重更新使用三值{+1,0,-1};并最小化全精度权重W和三值权重W的欧式距离。

原理：

原始问题：设置传播中的权重为三值，用三值的权重逼近全精度权重 $W$ ，三值权重 $W^t$ ;目标： $W ≈ \alpha W^t$

$\begin{cases} \alpha^*, W^{t*}= \begin{equation} \mathop{\arg\min}_{\alpha, W^t} \| \mathrm{J} (\alpha,W^t)\|= \|W-\alpha W^t\|^2_2 \end{equation} \\ s.t.　　\alpha \geq 0， W^t_i ∈ \{-1,0,1\}, i = 1, 2, ..., n. \end{cases} \tag{1}$
传播过程：

$\begin{cases} Z = X*W \approx X*(\alpha W^t)=(\alpha X) \bigoplus W^t \\ X^{next} = g(Z) \end{cases} \tag{2}$

使用特定阈值来设置权重 $W^t_i$ 为 {+1, 0, 1}，寻找一个适当的阈值 $\Delta$ 和 $\alpha$ 来拟合上述问题；这样用阈值和 $W_i$ 来确定权重 $W^t$ ，简化了计算量；而且使用+1/-1/0更使计算从原先的乘法变成加法（+1/-1直接是本身的加减）；此外 $\Delta$ 和 $\alpha$ 都是正数。

$W^t_i = f_t(W_i|\Delta) = \begin{cases} +1, 　if 　W_i 　> \Delta \\ 　0, 　if　|W_i| \leq \Delta \\ -1, 　if 　W_i　< -\Delta \\ \end{cases} \tag{3}$
公式1的优化问题化解为如下：
$\alpha^*, \Delta^* = \begin{equation} \mathop{\arg\min}_{\alpha \geq 0,\Delta>0} |I_\Delta|(\alpha)^2- 2(\sum_ {i \in I_\Delta}{|W_i|})\alpha + c \\ \end{equation} 注：c = \sum W_i^2 \rightarrow constant \\ |I_\Delta|=\{ i||W_i|>\Delta \} \tag{4}$
从而 $\alpha, \Delta$ 解：
$\alpha_\Delta^* = \frac{1}{|I_\Delta|}(\sum_ {i \in I_\Delta}{|W_i|}) \\ \tag{5}$

$\Delta^* = \begin{equation} \mathop{\arg\min}_{\Delta>0}-\frac{1}{|I_ \Delta|}(\sum_ {i \in I_\Delta}{|W_i|})^2 \\ = \mathop{\arg\max}_{\Delta>0}\frac{1}{|I_ \Delta|}(\sum_ {i \in I_\Delta}{|W_i|})^2 \end{equation} \tag{6}$

最终确定一个阈值 $\Delta$ 、 $\alpha$ 来构建三值神经网络；根据W具体的分布情况来确定阈值 $\Delta$ ：

均匀分布:在 $|W_i|$ 大于阈值的条件下： $\sum_{i \in I_\Delta}{W_i}$ 约等于均值( $\frac{a+\Delta}{2}$ )*数量( $|I_\Delta|$ )
[图片上传失败...(image-885207-1554553493996)]

正态分布：
[图片上传失败...(image-2a3804-1554553493996)]
最后本文章作者根据经验：

确定三值网络中的阈值与W期望的关系 $\Delta^*/E(|W|)$ ：： $\Delta^* = 0.7 * E({W}) \approx \frac{0.7}{n} \sum_{i=1}^{n}{|W_i|}$
均值分布： $\Delta^*/E(|W|) = \frac{2}{3} \approx 0.66$ (注 $W$ 在[-a,a]均匀取值，均值为 $E(|W|)= a/2$ )
正态分布： $\Delta^*/E(|W|) = \frac {0.6\sigma} {2\sigma/\sqrt{2\pi}} \approx 0.75$ (注： $E(|W|)=2\sigma/\sqrt{2\pi} )$

算法：

[图片上传失败...(image-d013ce-1554553493996)]
只在前向和后向过程中使用使用权值简化 $W^t_l$ ，但是更新是仍然是使用连续的权值。
优化方法：随机梯度下降(SGD+动量)、批量标准化和学习率衰减
Momentum动量:
$\begin{align} &V_{dw} = \beta V_{dw} + (1-\beta)d_w \\ &W= W-\alpha V_{dw} \end{align}$

实验过程：

本实验基于数据集MNIST、CIFAR-10、ImageNet以及三种网络结构（LeNet-5、VGG-7、ResNet-18(B)）进行测试，分别使用二值神经网络、三值神经网络及全精度网络，评测其效果。

论文结果：

TWNs在benchmark表现比全精度网络稍微差一点，但是模型压缩率达到了16/32倍。

源码：
前向后向更改在conv_layer.cpp line 30-100 Forward_cpu | backward_cpu

  const Dtype* weight = (BINARY || TERNARY) ? this->blobs_[0]->cpu_binary() : this->blobs_[0]->cpu_data();

caffe_cpu_ternary:
//将权重三值化1，0，-1 
//math_functions.cpp line294
template<>
void caffe_cpu_ternary<double>(const int N, const double delta, const double* X, double* Y){
    for(int i=0; i<N; i++){
        double x = X[i];
        Y[i] = (x>delta) - (x<-delta);
    }
}

delta：
//delta 设定，当前均值×0.7，delta在[-100,100]
template <typename Dtype>
void Blob<Dtype>::set_delta(){
  float scale_factor = TERNARY_DELTA * 1.0 / 10; 
  Dtype delta = (Dtype) scale_factor * this->asum_data() / this->count();
  delta = (delta <= 100) ? delta : 100;
  delta = (delta >= -100) ? delta : -100; 
  this->delta_ = delta;
}

ternarize_data：
//blob.cpp line146
//权重更新，量化三值
// revised 2016-3-21
template <typename Dtype>
void Blob<Dtype>::ternarize_data(Phase phase){

if(phase == RUN){

// if(DEBUG) print_head();

 //LOG(INFO) << "RUN phase...";
 // caffe_sleep(3);
 return; // do nothing for the running phase
}else if(phase == TRAIN){
 //LOG(INFO) << "TRAIN phase ...";
 // caffe_sleep(3);
}else{
 //LOG(INFO) << "TEST phase ...";
 // caffe_sleep(3);
}

  // const Dtype delta = 0; // default value; 
  // const Dtype delta = (Dtype) 0.8 * this->asum_data() / this->count();
  this->set_delta();
  const Dtype delta = this->get_delta();
  Dtype alpha = 1;

  if (!data_) { return; }
  switch (data_->head()) {
  case SyncedMemory::HEAD_AT_CPU:
{
    caffe_cpu_ternary<Dtype>(this->count(), delta, this->cpu_data(), this->mutable_cpu_binary());
    alpha = caffe_cpu_dot(this->count(), this->cpu_binary(), this->cpu_data());
    alpha /= caffe_cpu_dot(this->count(), this->cpu_binary(), this->cpu_binary());
    caffe_cpu_scale(this->count(), alpha, this->cpu_binary(), this->mutable_cpu_binary());
    // this->set_alpha(alpha);
}
    return;
  case SyncedMemory::HEAD_AT_GPU:
  case SyncedMemory::SYNCED:
#ifndef CPU_ONLY
{
    caffe_gpu_ternary<Dtype>(this->count(), delta, this->gpu_data(), this->mutable_gpu_binary());
    Dtype* pa = new Dtype(0);
    caffe_gpu_dot(this->count(), this->gpu_binary(), this->gpu_data(), pa);
    Dtype* pb = new Dtype(0);
    caffe_gpu_dot(this->count(), this->gpu_binary(), this->gpu_binary(), pb);
    
    alpha = (*pa) / ((*pb) + 1e-6);
    this->set_alpha(alpha);

    caffe_gpu_scale(this->count(), alpha, this->gpu_binary(), this->mutable_gpu_binary());
    // this->set_alpha((Dtype)1);

    // LOG(INFO) << "alpha = " << alpha;
    // caffe_sleep(3);
}
    return;
#else
    NO_GPU;
#endif
  case SyncedMemory::UNINITIALIZED:
    return;
  default:
    LOG(FATAL) << "Unknown SyncedMemory head state: " << data_->head();
  }
}

// Implemented 2016-3-16
template <typename Dtype>
void Blob<Dtype>::quantize_data(const Dtype left, const Dtype right){
  if (!data_) { return; }
  switch (data_->head()) {
  case SyncedMemory::HEAD_AT_CPU:
// cpu codes
if(DEBUG){
    LOG(INFO) << "CPU codes.";
    caffe_sleep(3);
}
    caffe_quantize(this->count(), left, right, this->cpu_data(), this->mutable_cpu_quantum());
    return;
  case SyncedMemory::HEAD_AT_GPU:
  case SyncedMemory::SYNCED:
#ifndef CPU_ONLY
// gpu codes
if(DEBUG){
    LOG(INFO) << "GPU codes.";
    caffe_sleep(3);
}
    caffe_gpu_quantize(this->count(), left, right, this->gpu_data(), this->mutable_gpu_quantum());

    // DEBUG caffe_gpu_quantize
if(DEBUG){
    std::cout << "--------------- data ----------------" << std::endl;
    for(int i=0; i<10; i++){
        std::cout << this->cpu_data()[i] << " ";
    }
    std::cout << std::endl;

    std::cout << "-------------- quantum ----------------" << std::endl;
    for(int i=0; i<10; i++){
        std::cout << this->cpu_quantum()[i] << " ";
    }
    std::cout << std::endl;

    caffe_sleep(10);
}

    return;
#else
    NO_GPU;
#endif
  case SyncedMemory::UNINITIALIZED:
    return;
  default:
    LOG(FATAL) << "Unknown SyncedMemory head state: " << data_->head();
  }
}

// Implemented @ 2016-3-18
template <typename Dtype>
void Blob<Dtype>::acmean_data(){
  if (!data_) { return; }
  switch (data_->head()) {
  case SyncedMemory::HEAD_AT_CPU:
// CPU codes
    return;
  case SyncedMemory::HEAD_AT_GPU:
  case SyncedMemory::SYNCED:
#ifndef CPU_ONLY
// GPU codes
    Dtype* X;
    X = mutable_cpu_acmean();
/*
    LOG(INFO) << "shape.size() = " << shape().size();
    for(int i=0; i<shape().size(); i++){
        LOG(INFO) << "shape(" << i <<") = " << shape(i);
    }
    LOG(INFO) << "acmean_len = " << acmean_len;
*/
    for(int i=0; i<shape(0); i++){
        const int n = count() / shape(0);
        // Dtype x = 0;
        // caffe_gpu_asum(n, this->gpu_data() + offset(i), &X[i]);
        // NOTE: cann't take this form of operation on GPU!
        X[i] = caffe_cpu_asum(n, this->cpu_data() + offset(i)) / n;
        // caffe_gpu_asum(n, this->gpu_data() + offset(i), &x);
        // X[i] /= n;
    }
    return;
#else
    NO_GPU;
#endif
  case SyncedMemory::UNINITIALIZED:
    return;
  default:
    LOG(FATAL) << "Unknown SyncedMemory head state: " << data_->head();
  }
}

// Implemented @ 2016-3-18
template <typename Dtype>
void Blob<Dtype>::scale_binary(){
  if (!data_) { return; }
  switch (data_->head()) {
  case SyncedMemory::HEAD_AT_CPU:
// CPU codes
    return;
  case SyncedMemory::HEAD_AT_GPU:
  case SyncedMemory::SYNCED:
#ifndef CPU_ONLY
// GPU codes
/*
    for(int i=0; i<shape(0); i++){
        const int n = count() / shape(0);
        caffe_gpu_scal<Dtype>(n, this->gpu_acmean()[i], this->mutable_gpu_binary() + offset(i));
    }
*/
    return;
#else
    NO_GPU;
#endif
  case SyncedMemory::UNINITIALIZED:
    return;
  default:
    LOG(FATAL) << "Unknown SyncedMemory head state: " << data_->head();
  }
}

三值神经网络