Zero-Shot LLM-Guided Autonomous Agent for Energy-Aware Resource Allocation in Embedded Systems

Mohammad Pivezhandi; Mahdi Banisharif; Abusayeed Saifullah

Zero-Shot LLM-Guided Autonomous Agent for Energy-Aware Resource Allocation in Embedded Systems

Mohammad Pivezhandi, Mahdi Banisharif, Abusayeed Saifullah

Published: 02 Mar 2026, Last Modified: 27 Mar 2026Agentic AI in the Wild: From Hallucinations to Reliable Autonomy PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: reinforcement learning, multi-agent systems, DVFS, embedded systems, energy efficiency, thermal management, LLM-based feature extraction, zero-shot learning, transfer learning, agentic AI, autonomous agents, resource allocation

TL;DR: LLM-guided multi-agent reinforcement learning achieves 7.09× energy efficiency and 9,000× faster decisions than table-based profiling for embedded DVFS control with zero-shot cross-platform deployment.

Abstract: Dynamic voltage and frequency scaling (DVFS) and task-to-core allocation are critical for thermal management and balancing energy and performance in embedded systems. Existing approaches either rely on utilization-based heuristics that overlook stall times, or require extensive offline profiling for table generation, preventing runtime adaptation. We propose a Zero-Shot hierarchical multi-agent reinforcement learning (MARL) framework for thermal- and energy-aware scheduling on multi-core platforms. Two collaborative agents decompose the exponential action space, achieving 358ms latency for subsequent decisions. First decisions require 3.5 to 8.0s including one-time LLM feature extraction. An accurate environment model leverages regression techniques to predict thermal dynamics and performance states. When combined with LLM-extracted semantic features, the environment model enables zero-shot deployment for new workloads on trained platforms by generating synthetic training data without requiring workload-specific profiling samples. We introduce LLM-based semantic feature extraction that characterizes OpenMP programs through 13 code-level features without execution. The Dyna-Q-inspired framework integrates direct reinforcement learning with model-based planning, achieving 20× faster convergence than model-free methods. Experiments on BOTS and PolybenchC benchmarks across NVIDIA Jetson TX2, Jetson Orin NX, RubikPi, and Intel Core i7 demonstrate 7.09× better energy efficiency and 4.0× better makespan than Linux ondemand governor. First-decision latency is 8,300× faster than table-based profiling, enabling practical deployment in dynamic embedded systems.

Submission Number: 69

Loading