Beyond Low-rank Decomposition: A Shortcut Approach for Efficient On-Device Learning

Le-Trung Nguyen; Aël Quélennec; Van-Tam Nguyen; Enzo Tartaglione

Beyond Low-rank Decomposition: A Shortcut Approach for Efficient On-Device Learning

Le-Trung Nguyen, Aël Quélennec, Van-Tam Nguyen, Enzo Tartaglione

Published: 01 May 2025, Last Modified: 23 Jul 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

TL;DR: In this paper, we introduce a novel shortcut method that reduces the activation map to a low-rank space, enabling the practical implementation of on-device learning.

Abstract: On-device learning has emerged as a promising direction for AI development, particularly because of its potential to reduce latency issues and mitigate privacy risks associated with device-server communication, while improving energy efficiency. Despite these advantages, significant memory and computational constraints still represent major challenges for its deployment. Drawing on previous studies on low-rank decomposition methods that address activation memory bottlenecks in backpropagation, we propose a novel shortcut approach as an alternative. Our analysis and experiments demonstrate that our method can reduce activation memory usage, even up to $120.09\times$ compared to vanilla training, while also reducing overall training FLOPs up to $1.86\times$ when evaluated on traditional benchmarks.

Lay Summary: The smart devices we use every day like phones, watches, and home assistants, etc are becoming more intelligent thanks to artificial intelligence (AI). However, in many cases, the AI is not actually running on the device itself. Instead, the device sends data to a server where the AI processes it and sends back a response. This setup can pose privacy risks if personal data is exposed during transmission. One way to fix this is by deploying the AI directly on the device. But teaching AI models on devices is hard because of limited memory and processing power. In our work, we build on ideas from earlier research to create a new method that reduces the size of data the AI needs to handle during learning. This helps cut down on memory use and computational cost, making it more practical to train AI directly on devices without compromising too much on performance.

Link To Code: https://github.com/Le-TrungNguyen/ICML2025-ASI

Primary Area: Deep Learning->Algorithms

Keywords: Deep Learning, Computer Vision, Compression, Low rank

Submission Number: 11346

Loading