Abstract: Conventional reinforcement learning (RL) algorithms often necessitate millions of environment interactions to ascertain an efficacious policy. In stark contrast, humans, leveraging their curiosity mechanisms, can develop proficient policies with minimal effort. Drawing inspiration from this observation, we introduce the Autoencoder Reconstruction Model(ARM), a curiosity-driven RL model that significantly reduces interactions while enhancing policy effectiveness. ARM employs an autoencoder module, utilizing a deep neural network to learn feature representations from the environment. ARM utilizes its Curiosity Measurement Module to motivate RL agents for effective exploration, particularly in environments with sparse rewards. ARM also introduces an innovative mechanism to balance the exploration-exploitation dilemma. Theoretical analyses reveal that the reward shaping introduced by the ARM aligns with the potential-based reward shaping paradigm, thereby preserving the optimality of reinforcement learning. We will release the source code and trained models to facilitate further studies in this research direction.
Loading