Abstract: Predictive models are increasingly important in enhancing decision-making processes. This study proposes an innovative approach utilizing DAgger, an imitation learning algorithm, to iteratively train a policy for addressing stochastic sequential decision problems. These problems can be challenging, especially when expert input is costly or unavailable. Our focus lies in crafting an effective expert within the DAgger framework, drawing from deterministic solutions derived from contextual scenarios generated at each decision point. Subsequently, a predictive model is developed to mimic the expert’s behavior, aiding real-time decision-making. To illustrate the applicability of this methodology, we address a dynamic employee call-timing issue concerning the scheduling of casual personnel for on-call work shifts. The key decision involves determining the optimal time to contact the next employee in seniority order, allowing them to select a preferred shift. Uncertainty arises from the varying response times of employees. The goal is to strike a balance between minimizing schedule changes induced by early notifications or calls and avoiding unassigned shifts due to late notifications. Unlike traditional predict-and-optimize approaches, our method utilizes optimization to train learning models that establish connections between the system’s current state and the expert’s wait time. We apply our algorithm using data provided by our industrial partner to derive an operational policy. Results demonstrate the superiority of this policy over the current heuristic method in use.
Loading