Red-Teaming Financial AI Agents: Stress-Testing Governance Protections in LLMs Against Market Manipulation and Regulatory Evasion

AAAI 2026 Workshop AIGOV Submission21 Authors

19 Oct 2025 (modified: 26 Nov 2025)AAAI 2026 Workshop AIGOV SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: AI Governance, Financial AI, Jailbreaking, AI Safety, Red-Teaming, Large Language Models, Algorithmic Auditing
TL;DR: This paper exposes critical vulnerabilities of LLMs to financial jailbreaks and proposes a novel fine-tuning method, FCFT, to significantly enhance their governance robustness for safe deployment in markets.
Abstract: The integration of Large Language Models (LLMs) into finance as autonomous agents introduces significant governance risks, including market manipulation and regulatory evasion. Current safety fine-tuning, designed for general harmlessness, proves inadequate against domain-specific adversarial attacks. This paper introduces a comprehensive framework for auditing and enhancing financial governance robustness in LLMs. We develop the FinJailbreak benchmark to systematically probe vulnerabilities, revealing critical failures in state-of-the-art models. In response, we propose Financial Constitutional Fine-Tuning (FCFT), a novel defense mechanism that embeds financial principles directly into the model. Our results demonstrate that FCFT significantly outperforms existing alignment techniques, reducing vulnerabilities by over 55\% and providing a concrete path toward "Governance by Design'' for high-stakes financial AI.
Submission Number: 21
Loading