ATLAS: Alibaba Dataset and Benchmark for Learning-Augmented Scheduling

Published: 26 Jan 2026, Last Modified: 01 Mar 2026ICLR 2026 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Scheduling with predictions, Dataset and benchmark, Machine learning, Learning augmented scheduling, Non-clairvoyant scheduling
Abstract: Learning-augmented scheduling uses ML predictions to improve decision-making under uncertainty. Many algorithms in this class have been proposed with better theoretical guarantees than the classic methods. Translating these theoretical results into practice, however, requires an understanding of real workloads. Such an understanding is hard to develop because existing production traces either lack the ground-truth processing times or are not publicly available, while synthetic benchmarks fail to represent real-world complexity. We fill this gap by introducing *Alibaba Trace for Learning-Augmented Scheduling (ATLAS)*, a research-ready dataset derived from Alibaba's Platform of Artificial Intelligence (PAI) cluster trace—a production system that processes hundreds of thousands of ML jobs per day. The ATLAS dataset has been cleaned and features engineered to represent the inputs and constraints of non-clairvoyant scheduling, including user tags, resource requests (CPU/GPU/memory), and job structures with ground-truth processing times. We develop a prediction benchmark reporting prediction error metrics, along with feature importance analysis, and introduce a novel multiple-stage ML model. We also provide a scheduling benchmark for minimizing the total completion time, max-stretch, and makespan. ATLAS is a reproducible foundation for researchers to study learning-augmented scheduling on real workloads, available at https://github.com/zhiyunjiang0810/non-clairvoyant-with-predictions.
Supplementary Material: zip
Primary Area: datasets and benchmarks
Submission Number: 8524
Loading