Abstract: This paper presents a hardware prototype and a framework for a new communication-aware model compression for distributed on-device inference. Our approach relies on Knowledge Distillation (KD) and achieves orders of magnitude compression ratios on a large pre-trained teacher model. The distributed hardware prototype consists of multiple student models deployed on Raspberry-Pi 3 nodes that run Wide ResNet and VGG models on the CIFAR10 dataset for real-time image classification. We observe significant reductions in memory footprint (50x), energy consumption (14x), latency (33x) and an increase in performance (12x) without any significant accuracy loss compared to the initial teacher model. This is an important step towards deploying deep learning models for IoT applications.
0 Replies
Loading