Adaptive Model Selection in Offline Contextual MDP's without Stationarity

Published: 28 Apr 2026, Last Modified: 28 Apr 2026Accepted by TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: Contextual MDP's are powerful tools with wide applicability in areas from biostatistics to machine learning. However, specializing them to offline datasets has been challenging due to a lack of robust, theoretically backed, methods. Our work tackles this problem by introducing a new approach towards adaptive estimation and cost optimization of contextual MDP's. This estimator, to the best of our knowledge, is the first of its kind, and is endowed with strong optimality guarantees. We achieve this by overcoming the key technical challenges evolving from the endogenous properties of contextual MDP's; such as non-stationarity, or model irregularity. Our guarantees are established under complete generality by utilizing the relatively recent and powerful statistical technique of $T$-estimation (Baraud, 2011). We first provide a procedure for selecting an estimator given a sample from a contextual MDP and use it derive oracle risk bounds under two distinct, but nevertheless meaningful, loss functions. We then consider the problem of determining the optimal control with the aid of the aforementioned density estimate and provide finite sample guarantees for the cost function.
Submission Type: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Zhiyu_Zhang1
Submission Number: 6868
Loading