Position: We Need An Algorithmic Understanding of Generative AI

Published: 01 May 2025, Last Modified: 23 Jul 2025ICML 2025 Position Paper Track spotlightposterEveryoneRevisionsBibTeXCC BY 4.0
TL;DR: Our position is that the ML community should prioritize systematic research into characterizing the algorithms LLMs learn and use to solve problems, AlgEval.
Abstract: What algorithms do LLMs actually learn and use to solve problems? Studies addressing this question are sparse, as research priorities are focused on improving performance through scale, leaving a theoretical and empirical gap in understanding emergent algorithms. This position paper proposes AlgEval: a framework for systematic research into the algorithms that LLMs learn and use. AlgEval aims to uncover algorithmic primitives, reflected in latent representations, attention, and inference-time compute, and their algorithmic composition to solve task-specific problems. We highlight potential methodological paths and a case study toward this goal, focusing on emergent search algorithms. Our case study illustrates both the formation of top-down hypotheses about candidate algorithms, and bottom-up tests of these hypotheses via circuit-level analysis of attention patterns and hidden states. The rigorous, systematic evaluation of how LLMs actually solve tasks provides an alternative to resource-intensive scaling, reorienting the field toward a principled understanding of underlying computations. Such algorithmic explanations offer a pathway to human-understandable interpretability, enabling comprehension of the model's internal reasoning performance measures. This can in turn lead to more sample-efficient methods for training and improving performance, as well as novel architectures for end-to-end and multi-agent systems.
Lay Summary: Generative AI, specifically Large language models (LLMs), have demonstrated impressive performance, yet we currently don’t know how they solve problems. Existing research has focused on scaling performance and the interpretability of individual components, leaving a gap in understanding the algorithms these models implicitly learn and apply. We propose AlgEval, a framework for evaluating how LLMs solve problems, and algorithmically quantifying how latent representations and attention transformations implement solutions layer by layer. AlgEval focuses on identifying algorithmic primitives and their composition by analyzing attention, hidden states, and inference-time compute. We demonstrate this through a case study or reasoning that requires planning and graph search, testing whether models implement classic algorithms like BFS or DFS using top-down hypotheses and bottom-up analysis. AlgEval proposes a research agenda for understanding AI systems and applying this understand to develop systems that are interpretable, sample-efficient, and grounded in theory. By uncovering algorithmic explanations, we move beyond black-box performance toward models that not only perform well but are also human-understandable and guide the design of more robust systems.
Verify Author Names: My co-authors have confirmed that their names are spelled correctly both on OpenReview and in the camera-ready PDF. (If needed, please update ‘Preferred Name’ in OpenReview to match the PDF.)
No Additional Revisions: I understand that after the May 29 deadline, the camera-ready submission cannot be revised before the conference. I have verified with all authors that they approve of this version.
Pdf Appendices: My camera-ready PDF file contains both the main text (not exceeding the page limits) and all appendices that I wish to include. I understand that any other supplementary material (e.g., separate files previously uploaded to OpenReview) will not be visible in the PMLR proceedings.
Latest Style File: I have compiled the camera ready paper with the latest ICML2025 style files <https://media.icml.cc/Conferences/ICML2025/Styles/icml2025.zip> and the compiled PDF includes an unnumbered Impact Statement section.
Paper Verification Code: ZTk3Z
Permissions Form: pdf
Primary Area: Research Priorities, Methodology, and Evaluation
Keywords: generative AI, reasoning, planning, search algorithms, inference time compute, graph navigation, LLM, machine learning, explainability, reinforcement learning, Algorithms, algorithmic understanding, algorithmic primitives, algorithmic composition, navigation, evaluation, interpretability
Submission Number: 176
Loading