Reasoning-Guided Evolutionary Prompt Optimization for Improved Financial Problem Solving

Published: 21 Nov 2025, Last Modified: 14 Jan 2026GenAI in Finance PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Automated Prompt Engineering, LLM-based Genetic Algorithms, Thinking Large Language Models, Financial Math Reasoning Benchmarking
TL;DR: The paper presents a reasoning-guided genetic algorithm for prompt optimization that uses thinking LLMs to generate and evaluate prompts on a Financial Math Reasoning benchmark.
Abstract: Prompt quality remains a primary bottleneck for deploying Large Language Models (LLMs) in high-stakes domains such as finance. Prior automated prompt optimization work has relied on ad-hoc heuristics or on LLM evaluators that lack explicit, stepwise reasoning, limiting the quality of discovered prompts. We introduce a Genetic Algorithm (GA) framework that uses thinking models (OpenAI's GPT-omni and GPT-5 variants) both to generate candidate prompts (initialization, crossover, mutation) and to evaluate their outputs, so evolution is guided by models that perform structured, multi-step inference. We evaluate this approach on a challenging Financial Math Reasoning benchmark, comparing GPT-5, GPT-5-mini, GPT5-nano, and GPT-o4-mini against non-thinking baselines, GPT-4.1, GPT-4.1-mini, GPT-4.1-nano, and GPT-4o. Fitness computation was standardized using GPT-5-nano as output evaluator, creating a consistent test bed for comparisons. Our results show that reasoning-enabled optimization consistently produces stronger prompts than non-thinking optimization and manually engineered prompts. More specifically, we show in this study that GA-evolved prompts exceeded manual prompts in 7 out of 8 model versions and yielded an average around 11% higher fitness over non-thinking baseline. These findings demonstrate that combining evolutionary search with reasoning-capable LLMs substantially improves automated prompt engineering for financial reasoning tasks.
Submission Number: 120
Loading