Abstract: Convolutional neural networks (CNNs) have achieved tremendous success in solving many challenging computer vision tasks. However, CNNs are extremely demanding for computation capability, memory space, and power capacity. This limits their usage to the cloud and prevents them from being deployed on edge devices with constrained resources and power. To tackle this problem, we propose craft distillation, a novel model compression approach that leverages both depthwise separable convolutions and knowledge distillation to significantly reduce the size of a highly complex model. Craft distillation has three advantages over existing model compression techniques. First, it does not require prior experiences on how to design a good “student model” for effective knowledge distillation. Second, it does not require specialized hardware support (e.g. ASIC or FPGA). Third, it is compatible with existing model compression techniques and can be used with pruning and quantization together to further reduce weight storage and arithmetic operations. Our experimental results show that with proper layer block replacement design and replacement strategy, craft distillation reduces the computational cost of VGG16 by 74.6% compared to the original dense models with negligible influence on accuracy.
0 Replies
Loading