Data Fusion–Enhanced Decision Transformer for Stable Cross-Domain Generalization

Data Fusion–Enhanced Decision Transformer for Stable Cross-Domain Generalization

ICLR 2026 Conference Submission17869 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: deep reinforcement learning, transformer, corss-domain policy adaptation

Abstract: Cross-domain shifts present a significant challenge for decision transformer (DT) policies. Existing methods typically rely on a single simple filtering criterion to select source trajectory fragments and stitch them together. They match either state structure or action feasibility. However, the selected fragments still have poor stitchability: state structures can misalign, the return-to-go (RTG) becomes incomparable when the reward or horizon changes, and actions may jump at trajectory junctions. As a result, RTG tokens lose continuity, which compromises DT's inference ability. To tackle these challenges, we propose Data Fusion–Enhanced Decision Transformer (DFDT), a compact pipeline that restores stitchability. Particularly, DFDT fuses scarce target data with selectively trusted source fragments via a two-level filter, Maximum Mean Discrepancy (MMD) mismatch for state-structure alignment and Optimal Transport (OT) deviation for action feasibility. It then trains on a feasibility-weighted fusion distribution. Furthermore, DFDT replaces RTG tokens with advantage-conditioned tokens, which improves the continuity of the semantics in the token sequence. It also applies a $Q$-guided regularizer to suppress junction value and action jumps. Theoretically, we provide bounds that tie state value and policy performance gaps to MMD-mismatch and OT-deviation, and show that the bounds tighten as these two measures shrink. We show that DFDT improves return and stability over strong offline RL and sequence-model baselines across gravity, kinematic, and morphology shifts on D4RL-style control tasks, and further corroborate these gains with token-stitching and sequence-semantics stability analyses.

Supplementary Material: zip

Primary Area: reinforcement learning

Submission Number: 17869

Loading