Abstract: The difference between training and testing environments is a huge challenge to generalizing reinforcement learning (RL) algorithms. We propose a soft contrastive learning with a coarser approximate $Q$ -irrelevance abstraction for RL (SCQRL) to increase RL generalization. Specifically, we specify the coarser approximate $Q$ -irrelevance abstraction as the feature of the state with a theoretical analysis for better generalization ability. We construct a positive and negative sample selection mechanism based on the $Q$ value for contrastive learning to achieve efficient representation learning. Considering the selection error of positive and negative samples, we design soft contrastive learning and combine it with RL in the form of an auxiliary task to propose SCQRL. The generalization experiments on several Procgen environments demonstrate that SCQRL outperforms the excellent generalized RL algorithm.
Loading