DOPPLER: Dual-Policy Learning for Device Assignment in Asynchronous Dataflow Graphs

ICLR 2026 Conference Submission14619 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Reinforcement Learning; Graph Scheduling; Distributed Systems;
TL;DR: We can do better device assignment of dataflow graphs on multi-GPU systems using dual policy networks, training with real system during deployment, and other techniques.
Abstract: We study the problem of assigning operations in a dataflow graph to devices to minimize execution time in a work-conserving system, with emphasis on complex machine learning workloads. Prior learning-based approaches face three limitations: (1) reliance on bulk-synchronous frameworks that under-utilize devices, (2) learning a single placement policy without modeling the system dynamics, and (3) depending solely on reinforcement learning in pre-training while ignoring optimization during deployment. We propose Doppler, a three-stage framework with two policies—SEL for selecting operations and PLC for placing them on devices. Doppler consistently outperforms baselines by reducing execution time and improving sampling efficiency through faster per-episode training.
Primary Area: reinforcement learning
Submission Number: 14619
Loading