Leveraging Fully-Observable Solutions for Improved Partially-Observable Offline Reinforcement Learning

Chulabhaya Wijesundara; Andrea Baisero; Gregory David Castanon; Alan S Carlin; Robert Platt; Christopher Amato

Leveraging Fully-Observable Solutions for Improved Partially-Observable Offline Reinforcement Learning

Chulabhaya Wijesundara, Andrea Baisero, Gregory David Castanon, Alan S Carlin, Robert Platt, Christopher Amato

Published: 01 Sept 2025, Last Modified: 13 Oct 2025Accepted by TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: Offline reinforcement learning (RL) is a popular learning framework for control problems where online interactions with the environment are expensive, risky, or otherwise impractical. Existing offline RL methods commonly assume full observability of the state, and therefore there is a lack of offline RL methods that are specialized for the more general case of partially-observable control. To address this gap, we propose Cross-Observability Conservative Q-Learning (CO-CQL), an offline RL algorithm for partially-observable control that leverages fully-observable expert policies in an asymmetric learning setting. To motivate the use of fully-observable experts for partially-observable control, we formalize Cross-Observability Optimality Ratio (COOR), a theoretical measure of cross-observability that quantifies the benefit of learning asymmetrically from a fully-observable expert, and Cross-Observability Approximation Ratio (COAR), an estimation of COOR computable from trained policies. Our empirical evaluation on a wide variety of partially-observable challenges demonstrates that CO-CQL is able to exploit the guidance of fully-observable experts to outperform other state-of-the-art offline algorithms.

Submission Length: Regular submission (no more than 12 pages of main content)

Assigned Action Editor: ~Steffen_Udluft1

Submission Number: 4986

Loading