Two-Stage Coverage Expansion for Cross-Domain Offline Reinforcement Learning via Score-Based Generative Modeling

Two-Stage Coverage Expansion for Cross-Domain Offline Reinforcement Learning via Score-Based Generative Modeling

ICLR 2026 Conference Submission22071 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Reinforcement Learning, Offline Reinforcement Learning, Generative Model, Data Augmentation

Abstract: Cross-domain reinforcement learning (RL) aims to transfer knowledge from a source domain to a target domain with different dynamics, but existing approaches often directly reuse source transitions, which can lead to severe distributional mismatch and performance degradation when the domain gap is large or target data is scarce. We propose Two-stage Coverage Expansion (TCE), a dual score-based generative framework that first expands state coverage through a mixture-based state score network and then aligns transitions with target-domain dynamics using a target-transition score network. This two-stage design broadens the effective support of the target dataset while mitigating harmful distributional shift, enabling more improved policy learning under limited target data. Extensive experiments on diverse cross-domain benchmarks demonstrate that TCE consistently outperforms state-of-the-art cross-domain RL baselines, achieving substantial gains even under large domain gaps and extremely small target datasets.

Supplementary Material: zip

Primary Area: reinforcement learning

Submission Number: 22071

Loading