CERTAIN: Context Uncertainty-aware One-Shot Adaptation for Context-based Offline Meta Reinforcement Learning

Hongtu Zhou; Ruiling Yang; Yakun Zhu; Haoqi Zhao; Hai Zhang; Di Zhang; Junqiao Zhao; Chen Ye; Changjun Jiang

CERTAIN: Context Uncertainty-aware One-Shot Adaptation for Context-based Offline Meta Reinforcement Learning

Hongtu Zhou, Ruiling Yang, Yakun Zhu, Haoqi Zhao, Hai Zhang, Di Zhang, Junqiao Zhao, Chen Ye, Changjun Jiang

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Existing context-based offline meta-reinforcement learning (COMRL) methods primarily focus on task representation learning and given-context adaptation performance. They often assume that the adaptation context is collected using task-specific behavior policies or through multiple rounds of collection. However, in real applications, the context should be collected by a policy in a one-shot manner to ensure efficiency and safety. We find that intrinsic context ambiguity across multiple tasks and out-of-distribution (OOD) issues due to distribution shift significantly affect the performance of one-shot adaptation, which has been largely overlooked in most COMRL research. To address this problem, we propose using heteroscedastic uncertainty in representation learning to identify ambiguous and OOD contexts, and train an uncertainty-aware context collecting policy for effective one-shot online adaptation. The proposed method can be integrated into various COMRL frameworks, including classifier-based, reconstrution-based and contrastive learning-based approaches. Empirical evaluations on benchmark tasks show that our method can improve one-shot adaptation performance by up to 36% and zero-shot adaptation performance by up to 34% compared to existing baseline COMRL methods.

Lay Summary: Imagine trying to teach a robot how to do many different tasks, like driving a car, sorting packages, or playing games. Instead of training it from scratch every time, we want the robot to quickly figure out new tasks by learning how to learn — this is called meta-reinforcement learning. In the real world, the robot doesn’t have the luxury of practicing a task many times before doing it. It needs to adapt based on just one try — we call this one-shot adaptation. But that’s hard, especially when: 1. The hints it gets about the new task are confusing or unclear. 2. The new task is very different from anything it has seen before. Most current methods don’t handle these problems well. This paper introduces a new approach that helps the robot know when it’s unsure or facing something unfamiliar. It does this by measuring uncertainty during learning and using that information to gather better clues about the task. This way, the robot can make smarter decisions even with limited experience.

Link To Code: https://github.com/tiev-tongji/Certain

Primary Area: Reinforcement Learning->Batch/Offline

Keywords: Offline Meta Reinforcement Learning, Context-based, One-Shot Adaptation, OOD, Ambiguity, Uncertainty

Submission Number: 4330

Loading