everyone
since 04 Oct 2024">EveryoneRevisionsBibTeXCC BY 4.0
Recent studies have shown that bisimulation metrics possess the superiority of essentially extracting the features related to reinforcement learning tasks. However, limited by strict assumptions and the inherent conflict between metrics and sparse rewards, they suffer from serious representation degeneration and even collapse in sparse reward settings. To tackle the problems, we propose a reward-free weak bisimulation metric-based scalable representation learning approach (SRL). Specifically, we first introduce the weak bisimulation metric, which bypasses the intractable reward difference, instead leveraging a trainable Gaussian distribution to relax the traditional bisimulation metrics. Particularly, the Gaussian noise creates a flexible information margin for the metric optimization, which mitigates potential representation collapse caused by sparse rewards. Additionally, due to its pure distribution internally, the metric potentially mitigates representation degeneration resulting from inconsistent computations under strict assumptions. To tighten the metric, we accordingly consider continuous differences over the transition distribution to enhance the accuracy of the initial transition distribution difference, strengthening the extraction of equivalent task features. We evaluate SRL on challenging DeepMind Control Suite, MetaWorld, and Adroit tasks with sparse rewards. Empirical results demonstrate that SRL significantly outperforms state-of-the-art baselines on various tasks. The source code will be available later.