Keywords: Bancking LLM, RAG-LLM, Finance
TL;DR: Lessons from Deploying RAG Systems in Production
Abstract: Large language models are increasingly used across various fields.
However, their adoption in regulated high-stakes
domains such as banking faces resistance due to the demand for high accuracy,
compliance with regulations, and the need for traceable, grounded responses.
In this work, we present a complete recipe for training and deploying
grounded banking LLMs. First, we describe a high-quality data generation
pipeline combining LLM-as-a-Judge filtering, citation annotation, and
two-stage curriculum learning that achieves 73% citation quality with only
143M tokens. Second, we train a model to respond with "I don't know" when information
is incomplete. We determine that a ratio of 22% unanswerable examples in training data yields calibrated refusal behavior.
Third, we provide an end-to-end methodology, from data curation to quantization to serving,
validated at 40+ financial institutions with real-world business outcomes. Our 12B model
outperforms GPT-4.1 on answer quality and citation grounding while responding 3--5x faster.
Submission Type: Deployed
Submission Number: 306
Loading