A Near-Optimal Control Policy for Data-driven Assemble-to-order Systems

Published: 28 Nov 2025, Last Modified: 30 Nov 2025NeurIPS 2025 Workshop MLxOREveryoneRevisionsBibTeXCC BY 4.0
Keywords: assemble-to-order system, deep reinforcement learning, input convex neural network, data-driven optimization
Abstract: We study a data-driven assemble-to-order (ATO) control problem, aiming to synchronize component ordering and product assembly under unknown demand distributions and non-identical lead times. We address two key questions: the statistical tractability of learning a near-optimal policy from limited data and the computational complexity of obtaining it. To the best of our knowledge, our work is the first to analyze the sample efficiency for a general ATO system as a multidimensional control problem and to propose an algorithm that finds a provably near-optimal solution. Methodologically, we introduce a novel asymmetric Lipschitz continuity (ALC) property to establish regularity conditions for the infinite-horizon problem. Surprisingly, we prove that the data-driven ATO problem avoids the curse of dimensionality; the performance gap of our policy scales as $O(M^{-1/2}\log M)$ with sample size $M$, only logarithmically worse than approximating the demand mean. We develop a specialized reinforcement learning (RL) algorithm that exploits a convex-preserving property in ATO dynamics, using input convex neural networks and interior point methods to achieve computational feasibility. Numerical studies show our algorithm consistently and significantly outperforms existing heuristics and a general-purpose RL benchmark.
Submission Number: 59
Loading