Bellman--Whitney Envelopes: Sharp Partial Identification in Offline Control under Support Holes

Manoj Saravanan; Rohit Kumar Salla

Bellman--Whitney Envelopes: Sharp Partial Identification in Offline Control under Support Holes

Manoj Saravanan, Rohit Kumar Salla

Published: 25 May 2026, Last Modified: 27 May 2026DEMO 2026 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: offline reinforcement learning, off-policy evaluation, partial identification, support holes, Bellman–Whitney envelopes, Lipschitz smoothness, uncertainty quantification

TL;DR: Under genuine support holes, offline policy values are not point-identified; we show their exact Bellman-consistent value interval under Lipschitz smoothness and derive sharp uncertainty and action certificates.

Abstract: We study finite-horizon offline evaluation and control when a target policy enters state--action regions with zero behavior support, so the target value is not point-identified. We introduce a Bellman--Lipschitz compatibility class that constrains candidate $Q$-sequences only through Bellman equalities on the observed support and Lipschitz extensions off support. Under a rectangular Bellman--Lipschitz closure condition, we prove that the exact identified interval of the target-policy value is given by a backward Bellman--Whitney recursion, and that this recursion recovers the sharp smooth no-overlap interval exactly when $H=1$. We further show that the same endpoints admit a no-gap dual characterization via one-sided Bellman relaxations, and we identify a dynamic support-hole geometry for the interval width that is sharp on explicit least-favorable sequential families. On the statistical side, we prove deterministic stability of the recursive endpoints under joint perturbations of the support sets and supported Bellman operators, derive stagewise additive finite-sample endpoint-estimation bounds, and establish an oracle minimax lower bound on a favorable zero-width subclass. Finally, under the control analogue of our closure assumption, we derive Bellman--Whitney action certificates that partition actions into certifiably good, certifiably bad, and intrinsically ambiguous sets.

Submission Number: 24

Loading