UNIQ: Conformal Calibration for Adaptive Conservatism in Offline Reinforcement Learning

ADITYA UPADHYAY

UNIQ: Conformal Calibration for Adaptive Conservatism in Offline Reinforcement Learning

ADITYA UPADHYAY

Published: 25 May 2026, Last Modified: 27 May 2026DEMO 2026 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Offline Reinforcement Learning, Conformal Prediction, Uncertainty Estimation, Adaptive Conservatism, Implicit Q-Learning, Expectile Regression

TL;DR: UnIQ adapts offline RL conservatism per-state using conformal uncertainty. Based on IQL, it maps multi-expectile ensemble variance to adaptive expectiles. It improves D4RL results with near-IQL memory, a 10x VRAM reduction vs EDAC.

Abstract: Offline reinforcement learning requires careful conservatism to counter distribution shift, yet most methods apply a single fixed penalty regardless of how well a given state is covered by the data. We present UnIQ (Uncertainty-Informed Quantile), an offline RL method that adapts its conservatism per-state via conformally calibrated uncertainty. Building on IQL's implicit Q-learning backbone, UnIQ trains a multi-expectile value ensemble, computes distribution-free uncertainty bounds using split conformal prediction, and maps this signal to a state-adaptive expectile tau(s), relaxing conservatism in well-covered regions and strengthening it at the data frontier. On D4RL MuJoCo benchmarks, UnIQ outperforms IQL on Walker2d tasks and replay-heavy settings while operating at near-IQL memory cost (approx. 250 MB peak VRAM)—a 10x reduction versus EDAC. We explicitly report underperforming cases and position UnIQ as a practical mechanism contribution on the performance–efficiency frontier, rather than a claim of overall state-of-the-art.

Submission Number: 29

Loading