Beyond Worst-Case: Efficient Robust RL via In-Context Generalization

Juncheng Dong; Hao-Lun Hsu; Miroslav Pajic; Vahid Tarokh

Beyond Worst-Case: Efficient Robust RL via In-Context Generalization

Juncheng Dong, Hao-Lun Hsu, Miroslav Pajic, Vahid Tarokh

20 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: reinforcement learning, robustness, data augmentation, in-context learning

Abstract: Robustness and generalization are central challenges in reinforcement learning (RL). Classical robust RL handles perturbations with worst-case minimax optimization, which is hard to solve in practice and often yields pessimistic policies with suboptimal performance during deployment. In contrast, in-context RL (ICRL), where pretrained transformers adapt to new tasks without parameter updates, is designed for generalization. Rather than treating robustness and generalization as orthogonal, we demonstrate an interesting link that robustness can emerge as a consequence of generalization in ICRL, and that generalization can be systematically leveraged to improve robustness. Specifically, we apply ICRL models for robust RL and observe that even without explicit robust training, in-context models perform strongly on robustness benchmarks. Motivated by this, we investigate the robustness of ICRL models and identify significant performance degrade under disturbances. To address this limitation, building on the insight that we can transform robustness challenges to generalization problems, we propose in-context adversarial pretraining, which augments pretraining tasks with environment perturbations to expand diversity without solving the minimax game. We also introduce an adaptive rollout variant that uses the pretrained transformer to generate high-value trajectories online, improving coverage and sample efficiency. Across navigation and continuous-control benchmarks, these strategies consistently improve robustness and nominal return, offering a scalable path to robust performance without worst-case optimization or significant performance sacrifice.

Supplementary Material: zip

Primary Area: reinforcement learning

Submission Number: 23928

Loading