Approach
We present a compression technique for CNNs, where we prune the filters from CNNs that are identified as having a small effect on the output accuracy. By removing whole planes in the network, together with their connecting convolution kernels, the computational costs are reduced significantly.
-
Determining which filters to prune
we measure the importance of a filter in each layer by calculating its absolute weight sum
The procedure of pruning m filters from the ith convolutional layer is as follows:
In addition, to understand the sensitivity of each layer, we prune each layer independently and test the resulting pruned network’s accuracy on the validation set.
- Pruning filters across multiple layers
To prune filters across multiple layers, we use the following two criterion for filter selection:- Independent pruning determines which filters should be pruned at each layer independent of other layers.
- Greedy pruning accounts for the filters that have been already removed in the previous layers. This strategy does not consider the kernels from the previously pruned feature maps.
- Retraining pruned networks
After pruning the filters, any performance degradation should be compensated by retraining the network. There are two strategies to prune the filters across multiple layers:- Prune once and retrain: Prune filters of multiple layers at once and retrain them until the original accuracy is restored.
- Prune and retrain iteratively: Prune filters layer by layer or filter by filter and then retrain iteratively. The model is retrained before pruning the next layer for the weights to adapt to the changes from the pruning process.
Experiment
References:
Pruning Filters for Efficient ConvNets, Hao Li, 2017, ICLR