Perishable Online Inventory Control with Context-Aware Demand Distributions

Published: 28 Nov 2025, Last Modified: 30 Nov 2025NeurIPS 2025 Workshop MLxOREveryoneRevisionsBibTeXCC BY 4.0
Keywords: online learning, inventory control, kernel regression, contextual bandits
TL;DR: We study online contextual inventory control of perishable goods where the demand distribution can depend on contexts (and in a nonparametric way); then we give the minimax regret lower bound and a near-optimal algorithm.
Abstract: We study the online contextual inventory control problem with perishable goods. We consider a more realistic---and more challenging---setting where the demand depends linearly on observable features (as is standard), but the (residual) noise distribution depends non-parametrically on the features. Surprisingly, little is known when the noise is context-dependent, which captures the heteroskedastic uncertainty in demand that is important in inventory control. Unfortunately, the optimal inventory quantity in this more general setting is no longer a linear function of features (as is the case in the standard setting), making online gradient descent---the gold standard therein---inapplicable. We first present a minimax regret lower bound $\Omega(\sqrt{d T}+T^{\frac{p+1}{p+2}})$, which characterizes the fundamental limit of this learning problem. Here $d$ is the feature dimension, and $p \leq d$ is an underlying dimension that captures the intrinsic complexity of the noise distribution. Further, we propose an algorithm achieves the near-optimal regret $\widetilde{O}(\sqrt{d T}+T^{\frac{p+1}{p+2}})$. Additionally, under mild regularity conditions on the noise, we can achieve the improved $\widetilde{O}(\sqrt{d T} + p\sqrt{T})$ regret. To our best knowledge, our results provide the first minimax optimal characterization for online inventory control with context-dependent noise.
Submission Number: 78
Loading