Self-Referencing Agents for Unsupervised Reinforcement Learning

Andrew Zhao, Erle Zhu, Rui Lu, Matthieu Lin, Yong-Jin Liu, Gao Huang

Published: 2025, Last Modified: 16 Oct 2025Neural Networks 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Current unsupervised reinforcement learning methods often overlook reward nonstationarity during pre-training and the forgetting of exploratory behavior during fine-tuning. Our study introduces Self-Reference (SR), a novel add-on module designed to address both issues. SR stabilizes intrinsic rewards through historical referencing in pre-training, mitigating nonstationarity. During fine-tuning, it preserves exploratory behaviors, retaining valuable skills. Our approach significantly boosts the performance and sample efficiency of existing URL model-free methods on the Unsupervised Reinforcement Learning Benchmark, improving IQM by up to 17% and reducing the Optimality Gap by 31%. This highlights the general applicability and compatibility of our add-on module with existing methods.

External IDs:dblp:journals/nn/ZhaoZLLLH25