Using Pre-trained Language Model for Accurate ESG Prediction

Using Pre-trained Language Model for Accurate ESG Prediction

KDD 2024 Workshop KiL Submission1 Authors

23 Apr 2024 (modified: 29 Jun 2024)Submitted to KiL 2024EveryoneRevisionsBibTeXCC BY 4.0

Keywords: ESG, Pre-trained Language Model, Large Language Model

Abstract: Environmental, Social, and Governance (ESG) is crucial in investment decision-making and has gained substantial momentum in recent years, with an increase of ESG-centric research emerging. Concurrently, Natural Language Processing (NLP) has emer\-ged in the analysis of ESG-related texts. Despite this growing interest, the field faces persistent challenges, notably the lack of models and datasets specifically tailored for ESG categorization. This study presents a novel approach leveraging Pretrained Language Models (PLMs) and Large Language Models (LLMs) to tackle ESG text classification tasks. We introduce a pipeline for creating three specialized datasets for ESG analysis, encompassing a pre-training corpus, a labeled ESG dataset, and an ESG Supervised Fine-Tuning (SFT) dataset. Through the strategic extension of PLMs such as BERT, DistilRoBERTa, and RoBERTa, via continued pre-training on ESG texts, our approach significantly surpasses traditional baseline performances. Most notably, we introduce ESGLlama and FinLlama, domain-specific models derived from Llama2, with FinLlama demonstrating exceptional efficacy in financial benchmarks and ESG text comprehensions. Final evaluations reveal that our models achieve significant advancements in ESG classification, outperforming established baselines. These results highlight our methodologies' effectiveness and underscore the potential for further explorations in enhancing ESG text analytics through advanced NLP techniques.

Submission Number: 1

Loading