Two-Stage Coverage Expansion for Cross-Domain Offline Reinforcement Learning via Score-Based Generative Modeling

ICLR 2026 Conference Submission22071 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Reinforcement Learning, Offline Reinforcement Learning, Generative Model, Data Augmentation
Abstract: Cross-domain reinforcement learning (RL) aims to transfer knowledge from a source domain to a target domain with different dynamics, but existing approaches often directly reuse source transitions, which can lead to severe distributional mismatch and performance degradation when the domain gap is large or target data is scarce. We propose Two-stage Coverage Expansion (TCE), a dual score-based generative framework that first expands state coverage through a mixture-based state score network and then aligns transitions with target-domain dynamics using a target-transition score network. This two-stage design broadens the effective support of the target dataset while mitigating harmful distributional shift, enabling more improved policy learning under limited target data. Extensive experiments on diverse cross-domain benchmarks demonstrate that TCE consistently outperforms state-of-the-art cross-domain RL baselines, achieving substantial gains even under large domain gaps and extremely small target datasets.
Supplementary Material: zip
Primary Area: reinforcement learning
Submission Number: 22071
Loading