MobiLLM: Enabling On-Device Fine-Tuning of Billion-Sized LLMs via Server-Assisted Side-Tuning

Liang Li, Xingke Yang, Wen Wu, Hao Wang, Tomoaki Ohtsuki, Xin Fu, Miao Pan, Xuemin Shen

Published: 01 Oct 2025, Last Modified: 07 Jan 2026IEEE Journal of Selected Topics in Signal ProcessingEveryoneRevisionsCC BY-SA 4.0
Abstract: On-device fine-tuning of large language models (LLMs) has attracted a lot of attention because of its tailoring personalized models while retaining user data locally on the mobile device. However, it faces significant challenges due to prohibitive memory requirements and slow training speeds. In this paper, we propose MobiLLM, a novel scheme enabling memory-efficient LLM fine-tuning on a single mobile device via server-assisted side-tuning. Particularly, MobiLLM strategically offloads backpropagation computations to an edge server while allowing the resource-constrained mobile device to retain merely a pretrained backbone model with frozen parameters during finetuning. It constructs a backpropagation bypass via parallel adapters decoupled from the backbone. During forward propagation, the device employs low bitwidth quantization for transmitting intermediate activations to the server to reduce communication overhead. The advantage of MobiLLM lies in: 1) confining training data strictly to the mobile device, and 2) eliminating on-device backpropagation while overlapping local computations with server execution. Collectively, MobiLLM ensures the data never leaves the local mobile device while significantly reducing mobile memory and computational burdens. We implement MobiLLM on several popular mobile devices, including NVIDIA Jetson Xavier NX and CPU-only laptops. Extensive experimental results demonstrate that MobiLLM can enable a resource-constrained mobile device to fine-tune billion-sized LLMs, achieving up to $4\times$ memory reduction and $2.3\times$ faster convergence as compared to state-of-the-art baselines.
Loading