- Abstract: Deploying deep neural networks on mobile devices is a challenging task due to computation complexity and memory intensity. Existing works solve this problem by reducing model size using weight approximation methods based on dimension reduction (i.e., SVD, Tucker decomposition and Quantization). However, the execution speed of these compressed models are still far below the real-time processing requirement of mobile services. To address this limitation, we propose a novel acceleration framework: DeepRebirth by exploring the deep learning model parameter sparsity through merging the parameter-free layers with their neighbor convolution layers to a single dense layer. The design of DeepRebirth is motivated by the key observation: some layers (i.e., normalization and pooling) in deep learning models actually consume a large portion of computational time even few learned parameters are involved, and acceleration of these layers has the potential to improve the processing speed significantly. Essentially, the functionality of several merged layers is replaced by the new dense layer – rebirth layer in DeepRebirth. In order to preserve the same functionality, the rebirth layer model parameters are re-trained to be functionality equivalent to the original several merged layers. The extensive experiments performed on ImageNet using several popular mobile devices demonstrate that DeepRebirth is not only providing huge speed-up in model deployment and significant memory saving but also maintaining the model accuracy, i.e., 3x-5x speed-up and energy saving on GoogLeNet with only 0.4% accuracy drop on top-5 categorization in ImageNet. Further, by combining with other model compression techniques, DeepRebirth offers an average of 65ms model forwarding time on each image using Samsung Galaxy S6 with only 2.4% accuracy drop. In addition, 2.5x run-time memory saving is achieved with rebirth layers.
- Conflicts: lehigh.edu, samsung.com