SusGen-GPT: A Data-Centric LLM for Financial NLP and Sustainability Report Generation

Published: 29 Apr 2025, Last Modified: 04 Mar 2025NAACL 2025 FindingsEveryoneCC BY 4.0
Abstract: The rapid growth of the financial sector and the increasing focus on Environmental, Social, and Governance (ESG) considerations have created a pressing need for advanced natural language processing (NLP) tools. Despite recent advancements, there is still a notable absence of open-source Large Language Models (LLMs) that are proficient across both general finance and ESG domains, such as generating ESG reports. To address this gap, we introduce $SusGen$-$30k$, a high-quality, category-balanced dataset comprising seven financial NLP tasks. In addition, we propose $TCFD$-$Bench$, a benchmark designed to improve the evaluation of sustainability report generation. Our data-centric approach led to the development of a suite of models, $SusGen$-$GPT$, trained on the curated dataset. These models were evaluated across six adapted tasks and two off-the-shelf tasks, showing state-of-the-art performance, surpassing all other models except GPT-4. Remarkably, $SusGen$-$GPT$ achieved an average score only 0.02 below GPT-4, despite using models with only 7-8B parameters compared to much larger GPT-4. This demonstrates the efficiency of our approach in delivering high performance with significantly fewer resources, addressing existing challenges and fostering further advancements in the financial and ESG research community.
Loading