Exploration with Foundation Models: Capabilities, Limitations, and Hybrid Approaches

Remo Sasso; Michelangelo Conserva; Dominik Jeurissen; Paulo Rauber

Exploration with Foundation Models: Capabilities, Limitations, and Hybrid Approaches

Remo Sasso, Michelangelo Conserva, Dominik Jeurissen, Paulo Rauber

Published: 23 Sept 2025, Last Modified: 07 Dec 2025FoRLM 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Reinforcement Learning, Exploration Strategies, Zero-Shot Agents, Large Language Models, Vision-Language Models, Foundation Models, Multi-Armed Bandits, Gridworlds, Atari Games, Knowing–Doing Gap, Hybrid RL Frameworks, Semantic Priors, Sample Efficiency

TL;DR: We study how foundation models (LLMs and VLMs) can support exploration in reinforcement learning through a systematic benchmark spanning multi-armed bandits, Gridworlds, and Atari, and hybrid RL+VLM approaches that can improve early sample efficiency

Abstract: Exploration in reinforcement learning (RL) remains challenging, particularly in sparse-reward settings. While foundation models possess strong semantic priors, their capabilities as zero-shot exploration agents in classic RL benchmarks are not well understood.We benchmark LLMs and VLMs on multi-armed bandits, Gridworlds, and sparse-reward Atari to test zero-shot exploration. Our investigation reveals a key limitation: while VLMs can infer high-level objectives from visual input, they consistently fail at precise low-level control—the “knowing–doing gap”. To analyze a potential bridge for this gap, we investigate a simple on-policy hybrid framework in a controlled, best-case scenario. Our results in this idealized setting show that VLM guidance can significantly improve early-stage sample efficiency, providing a clear analysis of the potential and constraints of using foundation models to guide exploration rather than for end-to-end control.

Submission Number: 55

Loading