everyone
since 29 Jun 2024">EveryoneRevisionsBibTeXCC BY 4.0
Environmental, Social, and Governance (ESG) is crucial in investment decision-making and has gained substantial momentum in recent years, with an increase of ESG-centric research emerging. Concurrently, Natural Language Processing (NLP) has emer-ged in the analysis of ESG-related texts. Despite this growing interest, the field faces persistent challenges, notably the lack of models and datasets specifically tailored for ESG categorization. This study presents a novel approach leveraging Pretrained Language Models (PLMs) and Large Language Models (LLMs) to tackle ESG text classification tasks. We introduce a pipeline for creating three specialized datasets for ESG analysis, encompassing a pre-training corpus, a labeled ESG dataset, and an ESG Supervised Fine-Tuning (SFT) dataset. Through the strategic extension of PLMs such as BERT, DistilRoBERTa, and RoBERTa, via continued pre-training on ESG texts, our approach significantly surpasses traditional baseline performances. Most notably, we introduce ESGLlama and FinLlama, domain-specific models derived from Llama2, with FinLlama demonstrating exceptional efficacy in financial benchmarks and ESG text comprehensions. Final evaluations reveal that our models achieve significant advancements in ESG classification, outperforming established baselines. These results highlight our methodologies' effectiveness and underscore the potential for further explorations in enhancing ESG text analytics through advanced NLP techniques.