Towards Energy-efficient Federated Learning via INT8-based Training on Mobile DSPs

Jinliang Yuan; Shangguang Wang; hongyu li; Daliang Xu; Yuanchun Li; Mengwei Xu; Xuanzhe Liu

Towards Energy-efficient Federated Learning via INT8-based Training on Mobile DSPs

Jinliang Yuan, Shangguang Wang, hongyu li, Daliang Xu, Yuanchun Li, Mengwei Xu, Xuanzhe Liu

Published: 23 Jan 2024, Last Modified: 23 May 2024TheWebConf24EveryoneRevisionsBibTeX

Keywords: Mobile computing, federated learning, energy efficiency

Abstract: AI is making the Web an even cooler place, but also introduces serious privacy risks due to the extensive user data collection. Federated learning (FL), as a privacy-preserving machine learning paradigm, enables mobile devices to collaboratively learn a shared prediction model while keeping all training data on devices. However, a key obstacle towards practical cross-device FL training is huge energy consumption, especially for lightweight mobile devices. In this work, we perform the first-of-its-kind analysis of im- proving FL performance through low-precision training with an energy-friendly Digital Signal Processor (DSP) on mobile devices. We first demonstrate that directly integrating the state-of-the-art INT8 (8-bit integer) training algorithm and classic FL protocols will significantly degrade the model accuracy. Moreover, we observe that there are still unavoidable frequent quantization operations on devices that cause extreme load stress on DSP-enabled INT8 training. To address the above challenges, we present Q-FedUpdate, an FL framework that efficiently preserves model accuracy with ultra- low energy consumption. It maintains a global full-precision model and allows the tiny model updates to be continuously accumulated, instead of being erased by the quantization. Furthermore, it intro- duces pipelining technology to parallel CPU-based quantization and DSP-enabled training, which reduces the floating-point computation overhead of frequent data quantization. Extensive experiments show that Q-FedUpdate can effectively reduce the on-device energy consumption by 21×, and accelerate the FL convergence by 6.1× with only 2% accuracy loss.

Track: Systems and Infrastructure for Web, Mobile, and WoT

Submission Guidelines Scope: Yes

Submission Guidelines Blind: Yes

Submission Guidelines Format: Yes

Submission Guidelines Limit: Yes

Submission Guidelines Authorship: Yes

Student Author: Yes

Submission Number: 158

Loading