AlphaBench: Benchmarking Large Language Models in Formulaic Alpha Factor Mining

Haochen Luo; Ho Tin Ko; Jiandong Chen; David Sun; Yuan Zhang; Chen Liu

AlphaBench: Benchmarking Large Language Models in Formulaic Alpha Factor Mining

Haochen Luo, Ho Tin Ko, Jiandong Chen, David Sun, Yuan Zhang, Chen Liu

Published: 26 Jan 2026, Last Modified: 26 Feb 2026ICLR 2026 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Alpha Mining, LLM Benchmark, LLM Agent, Data Science and Engineering

Abstract: Formulaic alpha factor mining (FAFM) is a central problem in quantitative investment, where interpretable formulas are designed to extract predictive signals from historical financial series. With the emergence of large language models (LLMs), recent studies have begun to explore their roles in FAFM, yet their capabilities across different tasks and configurations remain unclear. In this work, we introduce AlphaBench, the first systematic benchmark for evaluating LLMs in FAFM. AlphaBench covers three core tasks, including factor generation, factor evaluation, and factor searching, which are all popular tasks integrated in the workflow of quantitative researchers. Beyond task-level evaluation, we further analyze how different LLM settings, including model type, prompting paradigm, and reasoning strategy, influence performance. Our experiments on a range of open-source and closed-source models reveal that LLMs hold strong potential in automating factor mining, while also facing persistent challenges in robustness, search efficiency, and practical usability. The project is available at: https://alphabench.cc/

Primary Area: datasets and benchmarks

Submission Number: 268

Loading