DOPPLER: Dual-Policy Learning for Device Assignment in Asynchronous Dataflow Graphs

Published: 26 Jan 2026, Last Modified: 01 Mar 2026ICLR 2026 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Distributed Systems; Reinforcement Learning; Graph Scheduling;
TL;DR: We can do better device assignment of dataflow graphs on multi-GPU systems using dual policy networks, training with real system during deployment, and other techniques.
Abstract: We study the problem of assigning operations in a dataflow graph to devices to minimize execution time in a work-conserving system, with emphasis on complex machine learning workloads. Prior learning-based approaches face three limitations: (1) reliance on bulk-synchronous frameworks that under-utilize devices, (2) learning a single placement policy without modeling the system dynamics, and (3) depending solely on reinforcement learning during pre-training while ignoring optimization during deployment. We propose Doppler, a three-stage framework with two policies—$\mathsf{SEL}$ for selecting operations and $\mathsf{PLC}$ for placing them on devices. Doppler consistently outperforms baselines by reducing execution time and improving sampling efficiency through faster per-episode training. Our results show that Doppler achieves up to 52.7\% lower execution times than the best baseline. The code is available at https://github.com/xinyuyao/Doppler.
Primary Area: reinforcement learning
Submission Number: 14619
Loading