Energy-Based Transfer for Reinforcement Learning

Energy-Based Transfer for Reinforcement Learning

ICLR 2026 Conference Submission14350 Authors

18 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Reinforcement learning, transfer learning

Abstract: Reinforcement learning is a powerful framework for sequential decision-making, but its sample inefficiency limits its scalability, especially in multi-task or continual learning settings. A common solution is to transfer knowledge from a teacher policy to guide exploration in new tasks. However, blindly applying such guidance can degrade performance by biasing exploration toward low-reward behaviors. We propose an introspective transfer learning method that selectively guides the student only when the teacher is likely to be helpful. Using energy-based models for out-of-distribution detection, the teacher issues advice only in familiar states -- those within its training distribution. We theoretically show that energy scores reflect the state visitation density under the teacher policy, and empirically demonstrate improved sample efficiency and returns in single-task and multi-task settings.

Supplementary Material: zip

Primary Area: reinforcement learning

Submission Number: 14350

Loading