Uncovering RL Integration in SSL Loss: Objective-Specific Implications for Data-Efficient RL

Published: 13 Oct 2024, Last Modified: 02 Dec 2024NeurIPS 2024 Workshop SSLEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Self Predictive RL, Data Efficient RL, Self Supervised Learning
Abstract: In this study, we examine the impact of different SSL objectives within the Self Predictive Representations (SPR) framework. Specifically, we explore SSL modifications like terminal state masking and prioritized replay weighting, which were not explicitly discussed in the original framework. These modifications are RL-specific but are not applicable to all RL algorithms. As such, it is of interest to gauge their impact on performance and look at other SSL objectives. We evaluate six SPR variants on the Atari 100k benchmark, including versions without these modifications, as well as others incorporating feature decorrelation methods like Barlow Twins and VICReg, which cannot accommodate these specific adjustments. Additionally, we assess the performance of these objectives on the DeepMind Control Suite, where the environment does not feature these modifications. Our findings show that the SSL modifications within SPR significantly influence performance, underscoring the critical importance of both the SSL objective selection and its accompanying modifications in data-efficient and self-predictive reinforcement learning.
Submission Number: 59
Loading