Fast and Accurate Streaming CNN Inference via Communication Compression on the Edge

Diyi Hu, Bhaskar Krishnamachari

Published: 01 Jan 2020, Last Modified: 12 May 2023IoTDI 2020Readers: Everyone

Abstract: Recently, compact CNN models have been developed to enable computer vision on the edge. While the small model size reduces the storage overhead and the light-weight layer operations alleviate the burden of the edge processors, it is still challenging to sustain high inference performance due to limited and varying inter-device bandwidth. We propose a streaming inference framework to simultaneously improve throughput and accuracy by communication compression. Specifically, we perform the following optimizations: 1) Partition: we split the CNN layers such that the devices achieve computation load-balance; 2) Compression: we identify inter-device communication bottlenecks and insert Auto-Encoders into the original CNN to compress data traffic; 3) Scheduling: we adaptively select the compression ratio when the variation of bandwidth is large. The above optimizations improve inference throughput significantly due to better communication performance. More importantly, accuracy also increases since 1) fewer frames are dropped when input images are streamed in at a high rate, and 2) the frames successfully entering the pipeline are processed accurately since the AE-based compression incurs negligible information loss. We evaluate MobileNet-v2 on pipeline of Raspberry Pi 3B+. Our compression techniques lead to up to 32% accuracy improvement, when average Wi-Fi bandwidth varies from 3 to 9Mbps.

0 Replies