Order from Chaos: Leveraging the Order in Time as Intrinsic Reward

Published: 01 Jan 2023, Last Modified: 14 Jan 2025CSCS 2023EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Reinforcement learning (RL) agents often struggle with exploration problems where the environment is complex and the reward signal is sparse or delayed. In this paper, we propose a novel intrinsic reward mechanism that leverages the order of time to guide agents toward less chaotic policies. Our approach involves training a model to predict the correct order of a shuffled sequence of observations, which enables us to introduce the “orderability” score. This score captures the extent to which observations from a trajectory are uniquely ordered in time, and we hypothesize that it is a helpful metric for assessing the learning progress of reinforcement learning agents. By incorporating the orderability score as an intrinsic reward, we aim to encourage agents to explore their environment more effectively and achieve faster and more consistent reward maximization. In our experiments, we demonstrate that agents trained with the orderability intrinsic reward outperform baseline methods on challenging exploration tasks, highlighting the potential of our approach. By shedding light on the importance of time’s order in RL, we provide a fresh perspective on the challenge of exploration and pave the way for future research in this area.
Loading