Approach
In the first D (Dense) step, we train a dense network to learn connection weights and importance. In the S (Sparse) step, we regularize the network by pruning the unimportant connections with small weights and retraining the network given the sparsity constraint. In the final D (re-Dense) step, we increase the model capacityby removing the sparsity constraint, re-initialize the pruned parameters from zero and retrain the whole dense network.
The consistent and significant performance gain of DSD experiments shows the inadequacy of the current training methods for finding the best local optimum, while DSD effectively achieves superior optimization performance for finding a better solution.
Experiment
References:
DSD: DENSE-SPARSE-DENSE TRAINING FOR DEEP NEURAL NETWORKS,Song Han, 2017, ICLR