BAQE: Backend-Adaptive DNN Deployment via Synchronous Bayesian Quantization and Hardware Configuration Exploration

Published: 2025, Last Modified: 23 Jan 2026IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Efficiently deploying deep learning (DL) algorithms on different hardware backends has become a time-consuming challenge. Achieving ultimate inference efficiency on hardware requires both algorithm-level model compression techniques, such as model quantization, and hardware-level optimization, such as operation reconfiguration and scheduling. In this article, we propose BAQE, a unified deployment framework that bridges the gap between algorithm-level and backend-level optimization. By constructing a global search space, we can synchronously optimize both the model quantization settings and backend configuration parameters. To accelerate this laborious and time-consuming process, we propose a searching strategy based on multiobjective Bayesian optimization (BO) using a Gaussian model with deep kernel learning as the surrogate model. More importantly, BAQE can easily adapt to various backends with different hardware resources efficiently and effectively. Each inner step of the optimization process is aware of the genuine hardware resources, ensuring that all accuracy/latency metrics and historical knowledge/feedback are evaluated directly on the device within each iteration. Empirical results demonstrate that our approach achieves both superior inference time and accuracy with a faster optimization process.
Loading