Disentangling Exploration of Large Language Models by Optimal Exploitation

Published: 05 Mar 2025, Last Modified: 15 Mar 2025Reasoning and Planning for LLMs @ ICLR2025EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Large Language Model, Exploration, Sequential Decision Making
TL;DR: Evaluating large language models based on an optimal exploitation measures exploration progress accurately and can provide insights for prompt-engineering.
Abstract:

Exploration is a crucial skill for self-improvement and open-ended problem-solving. However, it remains unclear if large language models can effectively explore the state-space within an unknown environment. This work isolates exploration as the sole objective, tasking the agent with delivering information that enhances future returns. Within this framework, we argue that measuring agent returns is not sufficient for a fair evaluation and decompose missing rewards into exploration and exploitation components based on the optimal achievable return. Comprehensive experiments with various models reveal that most struggle to sufficiently explore the state-space and weak exploration is insufficient. We found a positive correlation of exploration performance with language comprehension and reasoning capabilities. Furthermore, we show that our decomposition can provide insights into differences in behaviors driven by prompt engineering, offering a valuable tool for refining performance in exploratory tasks.

Submission Number: 51
Loading