Bridging the Data Gap in Financial Sentiment: LLM-Driven Augmentation

Rohit Kumar; Chandan Nolbaria

Bridging the Data Gap in Financial Sentiment: LLM-Driven Augmentation

Rohit Kumar, Chandan Nolbaria

Published: 22 Jun 2025, Last Modified: 17 Jul 2025ACL-SRW 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Data Augmentation, Financial Sentiment Analysis, RAG, BERT, LLMs

TL;DR: Our work develops a framework that tries to mitigate temporal gap between old human annotated financial sentiment datasets and modern financial context using RAG, and finetuned classifier for robustness in augmented dataset.

Abstract: Static and outdated datasets hinder the accuracy of Financial Sentiment Analysis (FSA) in capturing rapidly evolving market sentiment. We tackle this by proposing a novel data augmentation technique using Retrieval Augmented Generation (RAG). Our method leverages a generative LLM to infuse established benchmarks with up-to-date contextual information from contemporary financial news. This RAG-based augmentation significantly modernizes the data’s alignment with current financial language. Furthermore, a robust BERT-BiGRU judge model verifies that the sentiment of the original annotations is faithfully preserved, ensuring the generation of high-quality, temporally relevant, and sentiment-consistent data suitable for advancing FSA model development.

Archival Status: Archival

Acl Copyright Transfer: pdf

Paper Length: Long Paper (up to 8 pages of content)

Submission Number: 353

Loading