Carbon- and System-Aware LoRA Scaling for On-Device LLMs via Hierarchical Multi-Objective Reinforcement Learning

Carbon- and System-Aware LoRA Scaling for On-Device LLMs via Hierarchical Multi-Objective Reinforcement Learning

ICLR 2026 Conference Submission22302 Authors

20 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Sustainable AI ; Carbon-Aware; LoRA; On-Device; LLM; Multi-Objective Reinforcement Learning

TL;DR: We introduce a hierarchical multi-objective reinforcement learning approach for dynamic Low-Rank Adaptation (LoRA) scaling that optimizes carbon and energy efficiency while maintaining acceptable performance and system budgets for on-device LMs.

Abstract: On-Device deployment of large and small language models (LLMs / SLMs) faces critical challenges in balancing performance, energy consumption, and carbon footprint on various mobile and wearable devices. We introduce a hierarchical multi-objective reinforcement learning approach for dynamic Low-Rank Adaptation (LoRA) scaling that optimizes carbon efficiency as the primary objective while maintaining acceptable performance and energy consumption. Our method employs Proximal Policy Optimization (PPO) with a carbon-first reward function that prioritizes carbon efficiency (inferences per mg CO$_2$) and then energy efficiency (inferences per Joule). Across smartwatches, AR glasses, VR headsets and tablets running DistilGPT2, OPT-125M, DialoGPT-Small, and GPT-2, our approach achieves an average of 20.5 inf/J (smartwatch) and up to a peak of 35.1 inf/J in optimal configurations, as well as up to 0.412 perf/mg CO$_2$. These results demonstrate the effectiveness of carbon-aware optimization for sustainable edge AI.

Primary Area: infrastructure, software libraries, hardware, systems, etc.

Submission Number: 22302

Loading