Keywords: Off‐Policy Evaluation, Industrial Reinforcement Learning, Chlor-alkali
Abstract: In industrial control settings, practitioners need not only accurate off‐policy evaluation (OPE) but also transparent, target-unit-based uncertainty estimates. We introduce \textsc{VaLOR} (Validation via Linear Offline Residuals), a lightweight protocol that fits linear surrogate models to production data and uses Mahalanobis‐based residual sampling to generate confidence intervals in physical KPI units. We apply \textsc{VaLOR} to different set-point recommendations made by RL agents trained on a large chlor‐alkali plant dataset, and discuss how it enables engineers and plant operators to compare policies with clear, bias‐corrected return estimates, facilitating informed, risk‐aware decision making, helping bridge the gap between RL research and industrial adoption.
Submission Number: 19
Loading