Toward Neural Streaming Scheduling: A Memory-Augmented Reinforcement Learning Model with Critical Structure Encoding

ICLR 2026 Conference Submission15052 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: streaming scheduling, dynamical systems, sequential decision making, critical structure encoding, historical context modeling, graph neural networks
Abstract: Many large-scale data analytics and AI systems execute jobs structured as Directed Acyclic Graphs (DAGs), which encode precedence constraints among interdependent stages. Efficient DAG scheduling is crucial for maximizing system throughput, especially in streaming settings where diverse jobs arrive continuously and require real-time decisions. Despite progress by heuristic and learning-based scheduling methods, capturing execution-critical structures and leveraging historical scheduling context remain key challenges. Building on this motivation, we propose MACE, a Memory-Augmented reinforcement learning model with Critical structure Encoding, which implements a scheduling policy for streaming jobs by sequentially selecting runnable stages and assigning parallelism based on cluster state. The policy is trained to minimize average job completion time using defined rewards. Specifically, MACE consists of two core components: (i) CSformer builds hierarchical embeddings that integrate stage-job-global information, capturing execution-critical structures through critical-path-aware positional encodings and a tunable attention field. This design guides the policy toward latency-sensitive and structurally related stages. (ii) A memory-augmented scheduler then uses the learned embeddings and a job memory to exploit historical contexts for the final stage and parallelism selection. Extensive experiments on Spark using the TPC-H benchmark demonstrate that MACE outperforms state-of-the-art baselines by up to 9.38% under diverse workload conditions.
Primary Area: learning on time series and dynamical systems
Submission Number: 15052
Loading