Behavior-agnostic Task Inference for Robust Offline In-context Reinforcement Learning

Long Ma; Fangwei Zhong; Yizhou Wang

Behavior-agnostic Task Inference for Robust Offline In-context Reinforcement Learning

Long Ma, Fangwei Zhong, Yizhou Wang

Published: 01 May 2025, Last Modified: 23 Jul 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY-NC 4.0

TL;DR: A model-based task inference method that is robust to changes in context behavior.

Abstract: The ability to adapt to new environments with noisy dynamics and unseen objectives is crucial for AI agents. In-context reinforcement learning (ICRL) has emerged as a paradigm to build adaptive policies, employing a **context** trajectory of the test-time interactions to infer the true task and the corresponding optimal policy efficiently without gradient updates. However, ICRL policies heavily rely on context trajectories, making them vulnerable to distribution shifts from training to testing and degrading performance, particularly in offline settings where the training data is static. In this paper, we highlight that most existing offline ICRL methods are trained for approximate Bayesian inference based on the training distribution, rendering them vulnerable to distribution shifts at test time and resulting in poor generalization. To address this, we introduce Behavior-agnostic Task Inference (BATI) for ICRL, a model-based maximum-likelihood solution to infer the task representation robustly. In contrast to previous methods that rely on a learned encoder as the approximate posterior, BATI focuses purely on dynamics, thus insulating itself against the behavior of the context collection policy. Experiments on MuJoCo environments demonstrate that BATI effectively interprets out-of-distribution contexts and outperforms other methods, even in the presence of significant environmental noise.

Lay Summary: We want agents who can adapt to different environments and tasks and adjust their behaviors given the currently observed circumstances. However, the observed circumstances during training could be different from those of test time, misleading the agent to try to solve the wrong task. We propose a method that focuses the agent's attention on task-related characteristics only, filtering out irrelevant distractions and improving robustness.

Primary Area: Reinforcement Learning->Deep RL

Keywords: In-context Reinforcement Learning, Distribution Shift, Meta-Reinforcement Learning, Task Inference, Offline Reinforcement Learning

Submission Number: 3821

Loading