Keywords: ConFIT, Financial Extraction, Contrastive Learning, Knowledge-Guided, Hard Negative Generation, SPP, NLI, Perplexity Filtering, FinBERT, Llama-3, FiQA, SENTIVENT
Abstract: Financial text extraction faces serious challenges in multi-entity sentiment attribution and numerical sensitivity, often leading to pitfalls in real-world deployment. In this work, we propose ConFIT (Contrastive Financial Information Tuning), a knowledge-guided contrastive learning framework that employs a Semantic-Preserving Perturbation (SPP) engine to generate high-quality, programmatically synthesized hard negatives. By integrating domain knowledge sources such as the Loughran-McDonald lexicon and Wikidata, and applying rigorous perplexity and Natural Language Inference (NLI) filtering, ConFIT trains language models to differentiate subtle perturbations in financial statements. Evaluations on FiQA and SENTiVENT datasets using FinBERT and Llama-3 8B illustrate both promising improvements and unexpected pitfalls, highlighting challenges that warrant further research.
Submission Number: 116
Loading