[Short] Evaluating Frontier Agents on End-to-End Investment Banking Workflows

Published: 02 Mar 2026, Last Modified: 02 Mar 2026ICLR 2026 Workshop DATA-FMEveryoneRevisionsCC BY 4.0
Keywords: AI agents; benchmarking; tool use; investment banking; financial analysis; evaluation
TL;DR: We introduce a realistic benchmark of end-to-end investment banking workflows and show that today’s frontier AI agents, even with access to industry tools and data rooms, still fail to reliably complete these high-stakes tasks.
Abstract: AI agents are expected automate professional work, yet a key question arises: how well do today's frontier models actually handle the $\textit{end-to-end analytical workflows}$ in economically high-value settings? We examine this question through the lens of investment banking by evaluating the performance of AI agents on tasks routinely performed by junior bankers.To ensure ecological validity, we collaborated with 175 investment bankers to develop an evaluation suite that replicates core features of their professional environment. Agents are assigned VP (Vice President) and MD (Managing Director)-level requests; granted access to realistic \emph{data rooms} and industry-standard tools (e.g., FactSet and SEC EDGAR); and required to produce multi-file deliverables, including financial models, slide decks, reports, and email summaries. Completing individual tasks required as much as 8 hours of banker time, highlighting the nontrivial labor investment and economic stakes for agents seeking to perform them autonomously. Across eight frontier models, we find that current AI systems struggle to reliably complete these workflows: even the best-performing model (Claude Opus 4.5) achieves only 33.8\% success. Our error analysis identifies key obstacles and routes to economic value when deploying agentic AI in high-stakes professional domains (such as internal consistency across deliverables and their client readiness).
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 144
Loading