Keywords: Offline evaluation; Off-policy evaluation; Distribution shift; Causal transportability; Safe deployment; Decision-making under uncertainty
Abstract: Decision-making from offline datasets often requires acting before reliable online evidence is available. Logged data may be selective, outcomes may be delayed, deployment populations may differ from historical ones, and offline metrics may only imperfectly reflect the true objective. Methods such as off-policy evaluation, sensitivity analysis, robustness analysis, and uncertainty quantification improve parts of this pipeline, but they do not by themselves answer the central offline-to-online question: what level of online action does the current offline evidence justify?
This position paper frames that question as a distinct decision problem. We propose a compact protocol organized around three dimensions of evidence---relevance, transportability, and reliability---and use it to distinguish whenoffline evidence supports deployment, targeted validation, constrained rollout, or deferral. The contribution is not a new estimator, but a decision-oriented framework for translating imperfect offline evidence into appropriate online action.
Submission Number: 132
Loading