FormulaCode: Evaluating Agentic Superoptimization on Large Codebases

Atharva Sehgal; James Hou; Swarat Chaudhuri; Jennifer J. Sun; Yisong Yue

FormulaCode: Evaluating Agentic Superoptimization on Large Codebases

Atharva Sehgal, James Hou, Swarat Chaudhuri, Jennifer J. Sun, Yisong Yue

Published: 14 Jun 2025, Last Modified: 19 Jul 2025ICML 2025 Workshop PRALEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Superoptimization, Benchmarking Agents, Evoluationary Algorithms, Repository-level code synthesis

TL;DR: FormulaCode is a continuously updating benchmark that complements SWE-Bench for evaluating optimization agents (like AlphaEvolve)

Track: Long Paper (up to 9 pages)

Abstract: Rapid advances in LLM agents have shown the ability to optimize code using continuous objective functions — a significant leap beyond traditional code generation techniques. However, there is an urgent need for novel benchmarks that can effectively measure this capability and translate it into real-world impact. Current code benchmarks, which often rely on binary pass/fail outcomes, offer a limited evaluation framework that falls short of capturing the full potential of these emerging capabilities. To bridge this gap, we introduce FormulaCode, a novel benchmark designed for evaluating agentic superoptimization on large codebases, with a focus on real-world performance optimization. Constructed from a dataset of 451 real-world performance bottlenecks automatically mined from Github, FormulaCode enables comprehensive testing of an agent's ability to triage, diagnose, and resolve inefficiencies in realistic software environments. FormulaCode proves to be a challenging benchmark for frontier LLMs and agentic frameworks, with unrestricted repository exploration emerging as a principal component for finding performance inefficiencies. By introducing FormulaCode, our goal is to drive the development of next‑generation optimization algorithms that meet the rigorous demands of real‑world software projects.

Format: We have read the camera-ready instructions, and our paper is formatted with the provided template.

Supplementary Material: pdf

De-Anonymization: This submission has been de-anonymized.

Anonymization: This submission has been anonymized for double-blind review via the removal of identifying information such as names, affiliations, and identifying URLs.

Submission Number: 30

Loading