Abstract: The distributed system Ray has attracted much attention for many decision-making applications. It provides a flexible and powerful distributed running mechanism for the training of the learning algorithms, which could map the computation tasks to the resources automatically. Task scheduling is a critical component in Ray, adopting a two-layer structure. It uses a simple general scheduling principle, which leaves much space to optimize. In this paper, we will study the two-layer scheduling problem in Ray, setting it as an optimization problem. We firstly present a comprehensive formulation for the problem and point out that it is a NP-hard problem. Then we design a hierarchical reinforcement learning method, named HierRL, which consists of a high-level agent and a low-level agent. Sophisticated state space, action space, and reward function are designed for this method. In the high level, we devise a value-based reinforcement learning method, which allocates a task to an appropriate node of the low level. With tasks allocated from the high level and generated from applications, a low-level reinforcement learning method is constructed to select tasks from the queue to be executed. A hierarchical policy learning method is introduced for the training of the two-layer agents. Finally, we simulate the two-layer scheduling procedure in a public platform, Cloudsim, with tasks from a real Dataset generated by the Alibaba Cluster Trace Program. The results show that the proposed method performs much better than the original scheduling method of Ray.
Loading