Exploration-Exploitation Prompting: A Dual-Process Framework for Complex Mathematical Problem Solving

Published: 01 Jun 2026, Last Modified: 01 Jun 2026IEEE ICRA 2026 Workshop Xplore PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Exploration, Evaluation, Verification, Exploitation
TL;DR: This paper introduces Exploration-Exploitation Prompting (EEP), to help Large Language Models solve complex, high-dimensional math problems more effectively than traditional methods.
Abstract: Solving high-dimensional mathematical problems requires more than just sequential reasoning; it requires a strategic balance between breadth of search and depth of computation. This paper introduces the Exploration-Exploitation Prompting (EEP) strategy. Inspired by reinforcement learning’s Multi-Armed Bandit problem and human cognitive dual-process theory, EEP bifurcates the Large Language Model’s (LLM) reasoning into two distinct phases: a “Global Exploration” phase to map potential solution spaces and a “Local Exploitation” phase to refine and execute the most promising path. We observe that our proposed Exploration-exploitation prompting method provides better results than the chain-of-thoughts and tree-of-thoughts prompting method on the Math-NET dataset.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 23
Loading