Nuclear Norm Maximization-Based Curiosity-Driven Reinforcement Learning

Published: 01 Jan 2024, Last Modified: 15 Apr 2025IEEE Trans. Artif. Intell. 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Reinforcement learning (RL) has achieved promising results in solving numerous challenging sequential decision problems. To address the issue of sparse extrinsic rewards, researchers have proposed intrinsic rewards, enabling the agent to acquire skills that may prove valuable in the pursuit of future rewards. One representative approach for generating intrinsic rewards involves constructing a predictive model to assess the novelty of states. However, due to the stochastic nature of complex environments, intrinsic rewards can be noisy. Directly employing noisy forward predictions to supervise policies can be detrimental to learning performance and efficiency. Many recent studies utilize the $\ell _{2}$ norm or variance to measure novelty, which further amplifies the noise through squaring operations. In this article, we aim to tackle these challenges by leveraging nuclear norm maximization (NNM). Specifically, we propose a novel curiosity reward that accurately quantifies the novelty of the exploration environment while exhibiting a high tolerance for noise and outliers. Our extensive experiments in various benchmark environments demonstrate that NNM achieves state-of-the-art performance compared with previous curiosity-based methods. When trained solely with intrinsic rewards, NNM achieves a human-normalized score of 1.09 on a subset of 26 Atari games, twice the performance of methods based on competitive intrinsic rewards.
Loading