Keywords: Offline Reinforcement Learning, Contextual Markovian Decision Process, Value Iteration, Inventory Control
Abstract: In this paper, we investigate the dynamic feature-based newsvendor problem within a multi-period inventory control setting featuring backlogged demands. Combining the significance of feature information with a multi-stage decision-making framework, we propose a general dynamic contextual newsvendor model. For this general model, we propose Contextual Value Iteration (CVI) algorithm and obtain its convergence rate to the optimal solution as well as sample complexity result. Our experimental result also demonstrates that our CVI is more efficient than value iteration for the vanilla Markovian Decision Process (MDP).
Submission Number: 118
Loading