FinRAG-12B: A Production-Validated Recipe for Grounded Generation in Banking

Published: 18 Apr 2026, Last Modified: 18 Apr 2026ACL 2026 Industry Track PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Bancking LLM, RAG-LLM, Finance
TL;DR: Lessons from Deploying RAG Systems in Production
Abstract: Large language models are increasingly used across various fields. However, their adoption in regulated high-stakes domains such as banking faces resistance due to the demand for high accuracy, compliance with regulations, and the need for traceable, grounded responses. In this work, we present a complete recipe for training and deploying grounded banking LLMs. First, we describe a high-quality data generation pipeline combining LLM-as-a-Judge filtering, citation annotation, and two-stage curriculum learning that achieves 73% citation quality with only 143M tokens. Second, we train a model to respond with "I don't know" when information is incomplete. We determine that a ratio of 22% unanswerable examples in training data yields calibrated refusal behavior. Third, we provide an end-to-end methodology, from data curation to quantization to serving, validated at 40+ financial institutions with real-world business outcomes. Our 12B model outperforms GPT-4.1 on answer quality and citation grounding while responding 3--5x faster.
Submission Type: Deployed
Submission Number: 306
Loading