Bandwidth-Efficient Inference for Nerual Image Compression

Shanzhi Yin, Tongda Xu, Yongsheng Liang, Yuanyuan Wang, Yanghao Li, Yan Wang, Jingjing Liu

Published: 2024, Last Modified: 28 Feb 2026ICASSP 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: With neural networks growing deeper and feature maps growing larger, limited communication bandwidth with external memory (or DRAM) and power constraints become a bottle-neck in implementing network inference on mobile and edge devices. In this paper, we propose an end-to-end differentiable bandwidth efficient neural inference method with the activation compressed by neural data compression method. Specifically, we propose a transform-quantization-entropy coding pipeline for activation compression with symmetric exponential Golomb coding and a data-dependent Gaussian entropy model for arithmetic coding. Optimized with existing model quantization methods, low-level task of image compression can achieve up to 19× bandwidth reduction with 6.21× energy saving. The code implementation is available at https://github.com/xyzysz/Bandwidth_efficient_nic.
Loading