Beyond Imitation: A Resource Adaptive Embedder that Outperforms its 14×Larger Teacher on Financial Retrieval

Published: 01 Jun 2026, Last Modified: 05 Jun 2026AdaptFM PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: efficient inference, knowledge distillation, embedding distillation, dense retrieval, financial retrieval, model compression, resource-adaptive foundation models, parameter-efficient adaptation, retrieval embeddings
Abstract: Resource adaptive deployment of foundation models often relies on knowledge distillation to compress a large teacher into a smaller student that imitates its outputs. We argue that for domains in which the teacher itself is unreliable, pure imitation is the wrong objective. We study financial retrieval, where embedders must distinguish texts that differ only in numeric content (“revenue grew 12.4%” vs. “1.24%”). On a constructed numeric gap test (**NumGap**), an 8B Qwen3 embedder ranks numeric perturbations as more similar to an anchor than topical distractors roughly 95% of the time. Standard alignment based distillation inherits this weakness. We present **Caliber**, a distillation recipe combining pure $\ell_2$ alignment with a margin based hinge that asks the student to discriminate numeric perturbations *more strongly* than the teacher. After only one training epoch on 606K passages, Caliber (0.6B parameters) exceeds the zero shot 8B teacher on FinanceBenchRetrieval by 14.3% relative nDCG@10 and improves NumGap-D by 20.6% relative over the LEAF style alignment only baseline. The recipe needs no relevance judgments, no hard negatives, and produces a 14× smaller model that is also more numerically faithful, advancing both the compression and quality dimensions of resource adaptive inference.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 51
Loading