QoS-Nets: Adaptive Approximate Neural Networks

Elias Trommer, Bernd Waschneck, Akash Kumar

Published: 2025, Last Modified: 12 Nov 2025AICAS 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: This work presents an optimization procedure for the flexible reuse of approximate multipliers in deep learning accelerators. Unlike previous approaches, our method can simultaneously determine multiple assignments of approximate multipliers to neural network layers, allowing a system to adapt its Quality of Service (QoS) to changing environmental conditions by dynamically trading accuracy for resource consumption. The proposed search algorithm chooses a subset of approximate multipliers from a large search space and enables retraining to maximize task performance.To maximize the model’s accuracy, we propose a fine-tuning scheme that shares the majority of parameters between all operating points, with only a small amount of additional parameters required per operating point. In our evaluation on MobileNetV2, we achieve power savings of 17.6% to 43.5% with a Top-5 accuracy loss of between 0.47 and 2.52 percentage points, while increasing the model’s parameter count by only 2.75%.

External IDs:dblp:conf/aicas/TrommerWK25