A hybrid approach to formulaic alpha discovery with large language model assistance

Shuo Yu, Hongyan Xue, Xiang Ao, Qing He

Published: 2026, Last Modified: 02 Feb 2026Frontiers Comput. Sci. 2026EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: In the domain of quantitative trading, the imperative is to translate historical financial data into predictive signals, commonly referred to as alpha factors, which serves to anticipate future market trends. Notably, formulaic alphas that are expressible via explicit mathematical formulas are highly sought after by certain investors for better interpretability. The evolving landscape of technology has witnessed the increasing deployment of large language models (LLMs) across various domains, which raises the question of whether LLMs can be effective in the context of formulaic alpha-mining tasks. This paper presents several paradigms aimed at integrating LLMs into the optimization loop of alpha mining, including scenarios where an LLM serves as the sole alpha generator, as well as instances where LLMs enhance existing frameworks. Empirical evaluations on real-world stock data demonstrate significant performance improvements, with our hybrid method achieving an average information coefficient (IC) of 0.0515, a 75% improvement over the baseline — a state-of-the-art reinforcement learning-based framework; backtesting further reveals a cumulative excess return more than double the baseline framework. These results underscore the potential of LLM-enhanced approaches in advancing formulaic alpha discovery and driving innovation in quantitative trading.

External IDs:dblp:journals/fcsc/YuXAH26