LLM-Assisted Reinforcement Learning for Distributed Scheduling

Yun Liu; Yuqi Feng; Jiahao Fan; Shangce Gao; Yanan Sun

LLM-Assisted Reinforcement Learning for Distributed Scheduling

Yun Liu, Yuqi Feng, Jiahao Fan, Shangce Gao, Yanan Sun

19 Sept 2025 (modified: 27 Nov 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Distributed Scheduling; Large Language Model; Reinforcement Learning

TL;DR: This paper proposes an LLM-assisted reinforcement learning algorithm for distributed flexible job-shop scheduling, where LLM-driven factory assignment and reward modeling improve coordination, credit assignment, and scheduling performance.

Abstract: The distributed flexible job-shop scheduling problem (DFJSP) involves coordinating job execution across distributed factories to achieve production goals. While existing reinforcement learning (RL)-based scheduling methods have shown promise in learning adaptive scheduling polices, they often rely on shallow networks and simple handcrafted rewards. These designs limit global state reasoning and accurate credit assignment under sparse rewards, thereby hindering the ultimately balanced workload distribution and efficient policy learning. To address these limitations, we propose a Large Language Model (LLM)-assisted RL algorithm tailored for DFJSP by leveraging the contextual reasoning and prior knowledge of LLM. Specifically, we propose an LLM-driven factory assignment mechanism that encodes global factory states and job features into structured queries, enabling context-aware and effective coordination among factories. Furthermore, we design an LLM-informed reward model that encodes scheduling-aware semantics into multi-dimensional proxy rewards for precise credit assignment during training. Theoretically, we provide a bound on the reward approximation error and prove that the proposed assignment strategy effectively reduces global workload variance. Extensive experiments conducted on public benchmarks (i.e., Hurink and Brandimarte) and multiple simulated DFJSP instances of varying scales demonstrate that our algorithm consistently outperforms RL-based scheduling methods, achieving the average makespan improvement ranging from $0.61$\% up to $25.78$\%. Our source code is available at https://anonymous.4open.science/r/LaRL-407B.

Primary Area: applications to robotics, autonomy, planning

Submission Number: 15396

Loading