Red-Teaming Financial AI Agents: Stress-Testing Governance Protections in LLMs Against Market Manipulation and Regulatory Evasion
Keywords: AI Governance, Financial AI, Jailbreaking, AI Safety, Red-Teaming, Large Language Models, Algorithmic Auditing
TL;DR: This paper exposes critical vulnerabilities of LLMs to financial jailbreaks and proposes a novel fine-tuning method, FCFT, to significantly enhance their governance robustness for safe deployment in markets.
Abstract: The integration of Large Language Models (LLMs) into finance as autonomous agents introduces significant governance risks, including market manipulation and regulatory evasion. Current safety fine-tuning, designed for general harmlessness, proves inadequate against domain-specific adversarial attacks. This paper introduces a comprehensive framework for auditing and enhancing financial governance robustness in LLMs. We develop the FinJailbreak benchmark to systematically probe vulnerabilities, revealing critical failures in state-of-the-art models. In response, we propose Financial Constitutional Fine-Tuning (FCFT), a novel defense mechanism that embeds financial principles directly into the model. Our results demonstrate that FCFT significantly outperforms existing alignment techniques, reducing vulnerabilities by over 55\% and providing a concrete path toward "Governance by Design'' for high-stakes financial AI.
Submission Number: 21
Loading