AMRE: Adaptive Multilevel Redundancy Elimination for Multimodal Mobile Inference

Published: 2025, Last Modified: 12 Nov 2025IEEE Trans. Mob. Comput. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Given privacy and network load concerns, employing on-device multimodal neural networks (MNNs) for IoT data is a growing trend. However, the high computational demands of MNNs clash with limited on-device resources. MNNs involve input and model redundancies during inference, wasting resources to process redundant input components and run excess model parameters. Model Redundancy Elimination (MRE) reduces redundant parameters but cannot bypass inference for unnecessary input components. Input Redundancy Elimination (IRE) skips inference for redundant input components but cannot reduce computation for the remaining parts. MRE and IRE independently fail to meet the diverse computational needs of multimodal inference. To address these issues, we aim to combine the advantages of MRE and IRE to achieve a more efficient inference. We propose an adaptive multilevel redundancy elimination framework (AMRE), which supports both IRE and MRE. AMRE first establishes a collaborative inference mechanism for IRE and MRE. We then propose a multifunctional, lightweight policy model that adaptively controls the inference logic for each instance. Moreover, a three-stage training method is proposed to ensure the performance of collaborative inference in AMRE. We validate AMRE in three scenarios, achieving up to 52.91% lower latency, 56.79% lower energy cost, and a slight accuracy gain compared to state-of-the-art baselines.
Loading