Towards Solving Industrial Sequential Decision-making Tasks under Near-predictable Dynamics via Reinforcement Learning: an Implicit Corrective Value Estimation Approach

Jianyong Yuan; Jiayi Zhang; Junchi Yan

Towards Solving Industrial Sequential Decision-making Tasks under Near-predictable Dynamics via Reinforcement Learning: an Implicit Corrective Value Estimation Approach

Jianyong Yuan, Jiayi Zhang, Junchi Yan

Published: 01 Feb 2023, Last Modified: 13 Feb 2023Submitted to ICLR 2023Readers: Everyone

Abstract: Learning to plan and schedule is receiving increasing attention for industrial decision-making tasks for its potential for outperforming heuristics, especially under dynamic uncertainty, as well as its efficiency in problem-solving, especially with the adoption of neural networks and the behind GPU computing. Naturally, reinforcement learning (RL) with the Markov decision process (MDP) becomes a popular paradigm. Rather than handling the near-stationary environments like Atari games or the opposite case for open world dynamics with high uncertainty. In this paper, we aim to devise a tailored RL-based approach for the setting in the between: the near-predictable dynamics which often hold in many industrial applications, e.g., elevator scheduling and bin packing, as empirical case studies tested in this paper. We formulate a two-stage MDP by decoupling the data dynamics from the industrial environment. Specifically, we design a bi-critic framework for estimating the state value in stages according to the two-stage MDP.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

Submission Guidelines: Yes

Please Choose The Closest Area That Your Submission Falls Into: Reinforcement Learning (eg, decision and control, planning, hierarchical RL, robotics)

TL;DR: We decouple the data dynamics of industrial sequential decision-making tasks and design a bi-critic framework to solve the state transition uncertainty.

Supplementary Material: zip

14 Replies

Loading