Position: We Need An Algorithmic Understanding of Generative AI

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 Position Paper Track spotlightposterEveryoneRevisionsBibTeXCC BY 4.0
TL;DR: Our position is that the ML community should prioritize systematic research into characterizing the algorithms LLMs learn and use to solve problems, AlgEval.
Abstract: What algorithms do LLMs actually learn and use to solve problems? Studies addressing this question are sparse, as research priorities are focused on improving performance through scale, leaving a theoretical and empirical gap in understanding emergent algorithms. This position paper proposes AlgEval: a framework for systematic research into the algorithms that LLMs learn and use. AlgEval aims to uncover algorithmic primitives, reflected in latent representations, attention, and inference-time compute, and their algorithmic composition to solve task-specific problems. We highlight potential methodological paths and a case study toward this goal, focusing on emergent search algorithms. Our case study illustrates both the formation of top-down hypotheses about candidate algorithms, and bottom-up tests of these hypotheses via circuit-level analysis of attention patterns and hidden states. The rigorous, systematic evaluation of how LLMs actually solve tasks provides an alternative to resource-intensive scaling, reorienting the field toward a principled understanding of underlying computations. Such algorithmic explanations offer a pathway to human-understandable interpretability, enabling comprehension of the model's internal reasoning performance measures. This can in turn lead to more sample-efficient methods for training and improving performance, as well as novel architectures for end-to-end and multi-agent systems.
Lay Summary: Generative AI, specifically Large language models (LLMs), have demonstrated impressive performance, yet we currently don’t know how they solve problems. Existing research has focused on scaling performance and the interpretability of individual components, leaving a gap in understanding the algorithms these models implicitly learn and apply. We propose AlgEval, a framework for evaluating how LLMs solve problems, and algorithmically quantifying how latent representations and attention transformations implement solutions layer by layer. AlgEval focuses on identifying algorithmic primitives and their composition by analyzing attention, hidden states, and inference-time compute. We demonstrate this through a case study or reasoning that requires planning and graph search, testing whether models implement classic algorithms like BFS or DFS using top-down hypotheses and bottom-up analysis. AlgEval proposes a research agenda for understanding AI systems and applying this understand to develop systems that are interpretable, sample-efficient, and grounded in theory. By uncovering algorithmic explanations, we move beyond black-box performance toward models that not only perform well but are also human-understandable and guide the design of more robust systems.
Primary Area: Research Priorities, Methodology, and Evaluation
Keywords: generative AI, reasoning, planning, search algorithms, inference time compute, graph navigation, LLM, machine learning, explainability, reinforcement learning, Algorithms, algorithmic understanding, algorithmic primitives, algorithmic composition, navigation, evaluation, interpretability
Submission Number: 176
Loading