Bridging the Gap Between AI Quantization and Edge Deployment: INT4 and INT8 on the Edge

Published: 24 Nov 2025, Last Modified: 24 Nov 20255th Muslims in ML Workshop co-located with NeurIPS 2025EveryoneRevisionsBibTeXCC BY 4.0
Keywords: AI, Quantization, EdgeAI, Efficient Deployment
Abstract: Quantization is the key to deploying neural networks on microcontroller-class edge devices. While INT4 and mixed-precision schemes promise strong compression–accuracy trade-offs in simulation, current toolchains only support INT8 in practice. We benchmark FP32, INT8, INT4, and mixed-precision on Tiny YOLOv2 and deploy INT8 models on STM32N6, exposing this research–deployment gap. To address it, we propose a heterogeneous sub-INT8 strategy that combines INT8 acceleration with selective INT4 fallback execution, enabling practical hybrid deployment on today’s edge hardware
Track: Track 2: ML by Muslim Authors
Submission Number: 61
Loading