Application of Multi-agent Reinforcement Learning to the Dynamic Scheduling Problem in Manufacturing Systems

David Heik; Fouad Bahrpeyma; Dirk Reichelt

Application of Multi-agent Reinforcement Learning to the Dynamic Scheduling Problem in Manufacturing Systems

David Heik, Fouad Bahrpeyma, Dirk Reichelt

Published: 01 Jan 2023, Last Modified: 01 Mar 2025LOD (2) 2023EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Most recent research in reinforcement learning (RL) has dem-onstrated remarkable results on complex strategic planning problems. Especially popular have become approaches which incorporate multiple agents to complete complex tasks in a cooperative manner. However, the application of multi-agent reinforcement learning (MARL) to manufacturing problems, such as the production scheduling problem, has been less frequently addressed and remains a challenge for current research. A major reason is that applications to the manufacturing domain are typically characterized by specific requirements, and impose the research community with major difficulties in terms of implementation. MARL has the capability to solve complex problems with enhanced performance in comparison with traditional methods. The main objective of this paper is to implement feasible MARL algorithms to solve the problem of dynamic scheduling in manufacturing systems using a model factory as an example. We focus on optimizing the performance of the scheduling task, which is mainly reflected in the maskspan. We obtained more stable and enhanced performance in our experiments with algorithms based on the on-policy policy gradient methods. Therefore, this study also investigates the promising and state-of-the-art single-agent reinforcement learning algorithms based on the on-policy method, including Asynchronous Advantage Actor-Critic, Proximal Policy Optimization, and Recurrent Proximal Policy Optimization, and compares the results with those of MARL. The findings illustrate that RL was indeed successful in converging to optimal solutions that are ahead of the traditional heuristic methods for dealing with the complex problem of scheduling under uncertain conditions.

Loading