Bit-serial Weight Pools: Compression and Arbitrary Precision Execution of Neural Networks on Resource Constrained ProcessorsDownload PDFOpen Website

10 Jun 2023OpenReview Archive Direct UploadReaders: Everyone
Abstract: Applications of neural networks on edge systems have proliferated in recent years but the ever increasing model size makes neural networks not able to deploy on resource-constrained microcontrollers efficiently. We propose bit-serial weight pools, an end-to-end framework that includes network compression and acceleration of arbitrary sub-byte precision. The framework can achieve up to 8x compression compared to 8-bit networks by sharing a pool of weights across the entire network. We further propose a bit-serial lookup based software implementation that allows runtime-bitwidth tradeoff and is able to achieve more than 2.8 x speedup and 7.5 x storage compression compared to 8-bit networks, with less than 1% accuracy drop.
0 Replies

Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview