Pessimism Principle Can Be Effective: Towards a Framework for Zero-Shot Transfer Reinforcement Learning

Chi Zhang; Ziying Jia; George K. Atia; Sihong He; Yue Wang

Pessimism Principle Can Be Effective: Towards a Framework for Zero-Shot Transfer Reinforcement Learning

Chi Zhang, Ziying Jia, George K. Atia, Sihong He, Yue Wang

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Transfer reinforcement learning aims to derive a near-optimal policy for a target environment with limited data by leveraging abundant data from related source domains. However, it faces two key challenges: the lack of performance guarantees for the transferred policy, which can lead to undesired actions, and the risk of negative transfer when multiple source domains are involved. We propose a novel framework based on the pessimism principle, which constructs and optimizes a conservative estimation of the target domain’s performance. Our framework effectively addresses the two challenges by providing an optimized lower bound on target performance, ensuring safe and reliable decisions, and by exhibiting monotonic improvement with respect to the quality of the source domains, thereby avoiding negative transfer. We construct two types of conservative estimations, rigorously characterize their effectiveness, and develop efficient distributed algorithms with convergence guarantees. Our framework provides a theoretically sound and practically robust solution for transfer learning in reinforcement learning.

Lay Summary: We address a key challenge in reinforcement learning: how to transfer knowledge from multiple known source environments to an unseen target environment. Our method is grounded in the pessimism principle, enabling agents to make conservative but effective decisions in unknown environments. We construct conservative performance estimators that serve as lower bounds, offering provable performance guarantees. To further avoid negative transfer, we selectively aggregate only the most relevant source knowledge. More importantly, our framework supports a distributed and privacy-preserving setup, where each local agent shares only Q-function updates—not raw data—ensuring scalability and communication efficiency. This framework enables robust, zero-shot transfer learning with theoretical guarantees, paving the way for safer reinforcement learning deployment in real-world applications like robotics and autonomous driving.

Primary Area: Reinforcement Learning

Keywords: zero-shot, transfer learning, distributed learning, reinforcement learning

Submission Number: 2663

Loading