Towards Better Accuracy-Efficiency Trade-Offs: Dynamic Activity Inference via Collaborative Learning From Various Width-Resolution Configurations

Lutong Qin, Lei Zhang, Chengrun Li, Chaoda Song, Dongzhou Cheng, Shuoyuan Wang, Hao Wu, Aiguo Song

Published: 01 Jan 2024, Last Modified: 06 Aug 2025IEEE Trans. Artif. Intell. 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Recently, deep neural networks have triumphed over a large variety of human activity recognition (HAR) applications on resource-constrained mobile devices. However, most existing works are static and ignore the fact that the computational budget usually changes drastically across various devices, which prevent real-world HAR deployment. It still remains a major challenge: how to adaptively and instantly tradeoff accuracy and latency at runtime for on-device activity inference using time series sensor data? To address this issue, this article introduces a new collaborative learning scheme by training a set of subnetworks executed at varying network widths when fueled with different sensor input resolutions as data augmentation, which can instantly switch on-the-fly at different width-resolution configurations for flexible and dynamic activity inference under varying resource budgets. Particularly, it offers a promising performance-boosting solution by utilizing self-distillation to transfer the unique knowledge among multiple width-resolution configuration, which can capture stronger feature representations for activity recognition. Extensive experiments and ablation studies on three public HAR benchmark datasets validate the effectiveness and efficiency of our approach. A real implementation is evaluated on a mobile device. This discovery opens up the possibility to directly access accuracy-latency spectrum of deep learning models in versatile real-world HAR deployments. Code is available at https://github.com/Lutong-Qin/Collaborative_HAR .