TL;DR: We propose SENSEI to equip model-based RL agents with intrinsic motivation for semantically meaningful exploration using VLMs.
Abstract: Exploration is a cornerstone of reinforcement learning (RL). Intrinsic motivation attempts to decouple exploration from external, task-based rewards. However, established approaches to intrinsic motivation that follow general principles such as information gain, often only uncover low-level interactions. In contrast, children’s play suggests that they engage in meaningful high-level behavior by imitating or interacting with their caregivers. Recent work has focused on using foundation models to inject these semantic biases into exploration. However, these methods often rely on unrealistic assumptions, such as language-embedded environments or access to high-level actions. We propose SEmaNtically Sensible ExploratIon (SENSEI), a framework to equip model-based RL agents with an intrinsic motivation for semantically meaningful behavior. SENSEI distills a reward signal of interestingness from Vision Language Model (VLM) annotations, enabling an agent to predict these rewards through a world model. Using model-based RL, SENSEI trains an exploration policy that jointly maximizes semantic rewards and uncertainty. We show that in both robotic and video game-like simulations SENSEI discovers a variety of meaningful behaviors from image observations and low-level actions. SENSEI provides a general tool for learning from foundation model feedback, a crucial research direction, as VLMs become more powerful.
Lay Summary: AI agents often explore their environment by trial and error, but humans, especially children, learn by doing things that feel meaningful, often through observing and imitating others. We developed SENSEI, a method that helps AI explore more like humans. SENSEI uses powerful AI chatbots, trained on vast amounts of internet data, to act as an “artificial caregiver”. As the agent interacts with its environment, this caregiver provides feedback about which scenes seem interesting. The agent then seeks out these interesting situations and tries new actions to explore further. This allows the agent to discover useful behaviors in both robot simulations and video games, using only raw images (like a video game screen) and simple controls (like button presses). Our work offers a new way for AI to explore, taking into account what humans might find interesting.
Link To Code: https://github.com/martius-lab/sensei
Primary Area: Reinforcement Learning->Deep RL
Keywords: intrinsic motivation, exploration, foundation models, model-based RL
Submission Number: 11579
Loading