PDQN - A Deep Reinforcement Learning Method for Planning with Long Delays: Optimization of Manufacturing Dispatching

David C Jenkins; René Arendt Sørensen; Vikramank Singh; Philip Kaminsky; Anil Aswani; Ramakrishna Akella

PDQN - A Deep Reinforcement Learning Method for Planning with Long Delays: Optimization of Manufacturing Dispatching

David C Jenkins, René Arendt Sørensen, Vikramank Singh, Philip Kaminsky, Anil Aswani, Ramakrishna Akella

Published: 28 Jan 2022, Last Modified: 13 Feb 2023ICLR 2022 SubmittedReaders: Everyone

Keywords: This paper proposes the PDQN, a novel method based on integrating Deep Reinforcement Learning and abstract planning for developing dynamic scheduling for policies in systems with very delayed rewards, e.g. manufacturing.

Abstract: Scheduling is an important component in Semiconductor Manufacturing systems, where decisions must be made as to how to prioritize the use of finite machine resources to complete operations on parts in a timely manner. Traditionally, Operations Research methods have been used for simple, less complex systems. However, due to the complexity of this scheduling problem, simple dispatching rules such as Critical Ratio, and First-In-First-Out, are often used in practice in the industry for these more complex factories. This paper proposes a novel method based on Deep Reinforcement Learning for developing dynamic scheduling policies through interaction with simulated stochastic manufacturing systems. We experiment with simulated systems based on a complex Western Digital semiconductor plant. Our method builds upon DeepMind’s Deep Q-network, and predictron methods to create a novel algorithm, Predictron Deep Q-network, which utilizes a predictron model as a trained planning model to create training targets for a Deep Q-Network based policy. In recent years, Deep Reinforcement Learning methods have shown state of the art performance on sequential decision-making processes in complex games such as Go. Semiconductor manufacturing systems, however, provide significant additional challenges due to complex dynamics, stochastic transitions, and long time horizons with the associated delayed rewards. In addition, dynamic decision policies need to account for uncertainties such as machine downtimes. Experimental results demonstrate that, in our simulated environments, the Predictron Deep Q-network outperforms the Deep Q-network, Critical Ratio, and First-In-First-Out dispatching policies on the task of minimizing lateness of parts.

One-sentence Summary: This paper proposes the PDQN, a novel method based on integrating Deep Reinforcement Learning and abstract planning for developing dynamic scheduling for policies in systems with very delayed rewards, e.g. manufacturing.

Supplementary Material: zip

12 Replies

Loading