FlexInt: A New Number Format for Robust Sub-8-Bit Neural Network Inference

Published: 01 Jan 2024, Last Modified: 07 Nov 2025ICCAD 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: While previous work has demonstrated that even large DNNs can be quantized to very low precision (sub-8-bit integers), concerns over robustness across different types of networks and datasets have led to a more serious consideration of floating-point (FP) formats in the industry. However, at 8 bits and below, there is no universally accepted FP format or one that provides robust performance on diverse data distributions. Thus in this paper, based on our analysis of integer (INT) and FP formats, we propose a novel number format called FlexInt, with a high dynamic range similar to FP, yet low max rounding error, targeting efficient representation of DNNs for inference at 8 bits and below. We also propose a novel FlexInt MAC (Multiply-Accumulate) hardware architecture. Our experimental results using large networks on image classification and natural language processing demonstrate that our FlexInt can deliver more robust performance and far superior worst-case accuracy, compared to both INT and FP across various data distributions; has a hardware overhead similar to that of FP; and can consistently make near-Pareto-optimal area-accuracy trade-offs across diverse networks.
Loading