Abstract: Many multi-robot applications require allocating a team of heterogeneous agents (robots) with different abilities to cooperatively complete a given set of spatially distributed tasks as quickly as possible. We focus on tasks that can only be initiated when all required agents are present otherwise arrived agents would be waiting idly. Agents need to not only execute a sequence of tasks by dynamically forming and disbanding teams to satisfy/match diverse ability requirements of each task but also account for the schedules of other agents to minimize unnecessary idle time. Conventional methods, such as mix-integer programming generally require centralized scheduling and a long optimization time, which limits their potential for real-world applications. In this work, we propose a reinforcement learning framework to train a decentralized policy applicable to heterogeneous agents. To address the challenge of complex cooperation learning, we further introduce a constrained flashforward mechanism to guide/constrain the agents' exploration and help them make better predictions. Through an attention mechanism that reasons about both short-term cooperation and long-term scheduling dependency, agents learn to reactively choose their next tasks (and subsequent coalitions) to avoid wasting abilities and to shorten the overall task completion time (makespan). We compare our method with State-of-the-Art heuristic and mixed-integer programming methods, demonstrating its generalization ability and showing it closely matches or outperforms these baselines while remaining at least two orders of magnitude faster.
Loading