Foundation Models for Enhanced Exploration in Reinforcement Learning

26 Sept 2024 (modified: 13 Nov 2024)ICLR 2025 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: reinforcement learning, exploration, large language models, vision-language models, multi-armed bandits
TL;DR: We examine the exploration ability of foundation models in traditional RL settings and, based on our findings, propose Foundation Model Exploration—a novel exploration scheme that leverages foundation models to enhance exploration efficiency in RL.
Abstract: Reinforcement learning agents often struggle with sample inefficiency, requiring extensive interactions with the environment to develop effective policies. This inefficiency is partly due to the challenge of balancing exploration and exploitation without the abstract reasoning and prior knowledge that humans use to quickly identify rewarding actions. Recent advancements in foundation models, such as large language models (LLMs) and vision-language models (VLMs), have shown human-level reasoning capabilities in some domains but have been underutilized in directly selecting low-level actions for exploration in reinforcement learning. In this paper, we investigate the potential of foundation models to enhance exploration in reinforcement learning tasks. We conduct an in-depth analysis of their exploration behaviour in multi-armed bandit problems and Gridworld environments, comparing their performance against traditional exploration strategies and reinforcement learning agents. Our empirical results suggest foundation models can significantly improve exploration efficiency by leveraging their reasoning abilities to infer optimal actions. Building on these findings, we introduce Foundation Model Exploration (FME), a novel exploration scheme that integrates foundation models into the reinforcement learning framework for intelligent exploration behaviour. We use VLMs and demonstrate that they can infer environment dynamics and objectives from raw image observations. This means FME only requires the action space as environment-specific manual text input. We find that agents equipped with FME achieve superior performance in sparse reward Gridworld environments and scale to more complex tasks like Atari games. Moreover, the effectiveness of FME increases with the capacity of the VLM used, indicating that future advancements in foundation models will further enhance such exploration strategies.
Primary Area: reinforcement learning
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 6618
Loading