ConFIT: A Robust Knowledge-Guided Contrastive Framework for Financial Extraction

ConFIT: A Robust Knowledge-Guided Contrastive Framework for Financial Extraction

12 Sept 2025 (modified: 06 Dec 2025)Submitted to Agents4ScienceEveryoneRevisionsBibTeXCC BY 4.0

Keywords: ConFIT, Financial Extraction, Contrastive Learning, Knowledge-Guided, Hard Negative Generation, SPP, NLI, Perplexity Filtering, FinBERT, Llama-3, FiQA, SENTIVENT

Abstract: Financial text extraction faces serious challenges in multi-entity sentiment attribution and numerical sensitivity, often leading to pitfalls in real-world deployment. In this work, we propose ConFIT (Contrastive Financial Information Tuning), a knowledge-guided contrastive learning framework that employs a Semantic-Preserving Perturbation (SPP) engine to generate high-quality, programmatically synthesized hard negatives. By integrating domain knowledge sources such as the Loughran-McDonald lexicon and Wikidata, and applying rigorous perplexity and Natural Language Inference (NLI) filtering, ConFIT trains language models to differentiate subtle perturbations in financial statements. Evaluations on FiQA and SENTiVENT datasets using FinBERT and Llama-3 8B illustrate both promising improvements and unexpected pitfalls, highlighting challenges that warrant further research.

Submission Number: 116

Loading