Keywords: Alpha Mining, LLM Benchmark, LLM Agent, Data Science and Engineering
Abstract: Formulaic alpha factor mining (FAFM) is a central problem in quantitative investment, where interpretable formulas are designed to extract predictive signals from historical financial series. With the emergence of large language models (LLMs), recent studies have begun to explore their roles in FAFM, yet their capabilities across different tasks and configurations remain unclear. In this work, we introduce AlphaBench, the first systematic benchmark for evaluating LLMs in FAFM. AlphaBench covers three core tasks, including factor generation, factor evaluation, and factor searching, which are all popular tasks integrated in the workflow of quantitative researchers. Beyond task-level evaluation, we further analyze how different LLM settings, including model type, prompting paradigm, and reasoning strategy, influence performance. Our experiments on a range of open-source and closed-source models reveal that LLMs hold strong potential in automating factor mining, while also facing persistent challenges in robustness, search efficiency, and practical usability.
Supplementary Material: zip
Primary Area: datasets and benchmarks
Submission Number: 268
Loading