$SusGen-GPT$: A Data-Centric LLM for Financial NLP and Sustainability Report Generation

ACL ARR 2024 June Submission5549 Authors

16 Jun 2024 (modified: 02 Jul 2024)ACL ARR 2024 June SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: The rapid growth of the financial sector and the increasing emphasis on Environmental, Social, and Governance (ESG) considerations have highlighted the need for advanced natural language processing (NLP) tools. Despite significant advancements, there remains a lack of open-source Large Language Models (LLMs) proficient across both general finance and ESG domains, such as generating ESG reports. To address this gap, we propose $SusGen-30k$, a high-quality, category-balanced dataset that comprises seven financial NLP tasks and ESG report generation. Additionally, we propose $TCFD-Bench$, a benchmark designed to enhance the evaluation of sustainability report generation. Employing a data-centric methodology, we developed a suite of models, referred to as $SusGen-GPT$. When trained on our curated dataset, these suites of models achieved state-of-the-art performance, surpassing the benchmarks set by models of significantly larger size. By doing so, we introduce a data-centric approach to effectively address the aforementioned existing challenges, aiming to foster continual development in the financial and ESG research community.
Paper Type: Long
Research Area: NLP Applications
Research Area Keywords: financial/business NLP
Contribution Types: Model analysis & interpretability, NLP engineering experiment, Publicly available software and/or pre-trained models, Data resources, Data analysis
Languages Studied: English
Submission Number: 5549
Loading