CNFINBENCH: A BENCHMARK FOR SAFETY AND COMPLIANCE OF LARGE LANGUAGE MODELS IN FINANCE

18 Sept 2025 (modified: 24 Feb 2026)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Safety and Compliance, Finance, LLM-judge ensemble, trustworthy AI
TL;DR: CNFinBench is a Chinese benchmark with a scalable judge ensemble that defines and evaluates financial LLM safety and compliance, revealing a gap between capability and compliance and enabling auditable alignment.
Abstract: Large language models are increasingly deployed across finance—for research, compliance support, risk analysis, and customer service—making rigorous safety evaluation essential. However, prior financial benchmarks largely emphasize textbook-style QA and numeric problem solving while under-testing real-world safety: they weakly assess regulatory compliance and investor-protection norms, seldom probe multi-turn adversarial tactics (e.g., jailbreaks, prompt injection, obfuscation), bind answers to long filings inconsistently, overlook tool/RAG risks, and rely on brittle or non-auditable judging. We introduce CNFinBench to close these gaps. CNFinBench organizes tasks under a Capability–Compliance–Safety triad, spanning evidence-grounded analysis of long financial reports, rule/tax rea- soning, and finance-tailored red-team dialogues that conceal violations in realistic contexts. It enforces auditability via strict output formats for objective items(with dynamic option perturbation) and a scalable judge design (LLM-ensemble with human calibration) for free-form responses, and it evaluates tool-augmented workflows to surface RAG/agent injection and over-reach risks. Experiments on diverse models reveal a persistent capability–compliance gap: systems strong on structured tasks often falter on compliance auditing, risk disclosure, and evidence consistency; refusal alone is not a reliable proxy for safety without cited, verifiable reasoning. CNFinBench delivers reproducible metrics, attack templates, and scoring scripts to support admission control, regression testing, and alignment in high-stakes financial settings.
Primary Area: datasets and benchmarks
Submission Number: 11074
Loading