Mixture of Autoencoder Experts Guidance using Unlabeled and Incomplete Data for Exploration in Reinforcement Learning
Keywords: Reinforcement Learning, Intrinsic Motivation, Expert Demonstrations, Incomplete Data, and Exploration.
Abstract: Recent trends in Reinforcement Learning (RL) highlight the need for agents to learn from
reward-free interactions and alternative supervision signals, such as unlabeled or incomplete
demonstrations, rather than relying solely on explicit reward maximization. Developing gener-
alist agents that can adapt efficiently in real-world environments often requires leveraging these
reward-free signals to guide learning and behavior. While intrinsic motivation techniques pro-
vide a means for agents to seek out novel or uncertain states in the absence of explicit rewards,
they are often challenged by dense reward environments or the complexity of high-dimensional
state and action spaces. Furthermore, most existing approaches rely directly on the unprocessed
intrinsic reward signals, which can make it difficult to shape or control the agent’s exploration
effectively. We propose an approach that can effectively utilize expert demonstrations, even
when they are incomplete and imperfect. By applying a mapping function to transform the
similarity between an agent’s state and expert data into a shaped intrinsic reward, our method
allows for flexible and targeted exploration of expert-like behaviors. We employ a Mixture
of Autoencoder Experts to capture a diverse range of behaviors and accommodate missing in-
formation in demonstrations. Experiments show our approach enables robust exploration and
strong performance in both sparse and dense reward environments, even when demonstrations
are sparse or incomplete. This provides a practical framework for RL in realistic settings where
optimal data is unavailable and precise reward control is needed.
Submission Number: 27
Loading