Navigating the Labyrinth: Evaluating LLMs’ Ability to Reason About Search Problems

TMLR Paper6202 Authors

14 Oct 2025 (modified: 22 Oct 2025)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: Large Language Models (LLMs) have recently achieved impressive performance in math and reasoning benchmarks. However, they often struggle with logic problems and puzzles that are relatively easy for humans. To further investigate this, we introduce a new benchmark, SearchBench, which contains 11 unique search problems inspired by intuitive puzzles. Each SearchBench problem type is equipped with automated pipelines to generate an arbitrary number of instances and analyze the feasibility, correctness, and optimality of LLM-generated solutions. We show that using step-by-step, language-only reasoning, even the most advanced LLMs fail to solve SearchBench; for example, OpenAI’s frontier models GPT-4 and o1-preview solve only 1.4% and 18.6% of problems, respectively. The reason is that SearchBench problems require considering multiple pathways and performing backtracking, posing a significant challenge to auto-regressive models. Interestingly, performance is significantly boosted when we prompt models to generate a complete A* search algorithm—a comparatively more cognitively difficult task. This approach effectively offloads the iterative search and backtracking process from the models, which they struggles with in text. This in-context learning baseline is further enhanced via a Multi-Stage-Multi-Try (MSMT) inference method, increasing GPT-4’s rate of correct solutions to over 57%.
Submission Type: Regular submission (no more than 12 pages of main content)
Previous TMLR Submission Url: https://openreview.net/forum?id=wk6KKtPrdp&noteId=wk6KKtPrdp
Changes Since Last Submission: The previous submission was desk-rejected due to the use of a modified format. In this resubmission, all special formatting packages have been removed. The TMLR .sty file has not been altered in any way and is used exactly as downloaded from the official TMLR website. All tables now follow the required format: the table number and caption appear before the table. Only a minimal set of additional standard packages for figures and tables (e.g., tabularx, makecell) were used to ensure clear and readable presentation of information. All other style files are the official TMLR LaTeX files, without modification.
Assigned Action Editor: ~Kuldeep_S._Meel2
Submission Number: 6202
Loading