Informing climate risk analysis using textual information - A research agenda

Published: 18 Jun 2024, Last Modified: 01 Jul 2024ClimateNLP 2024EveryoneRevisionsBibTeXCC BY 4.0
Keywords: climate disclosure, transition strategies, RAG system, information retrieval, human evaluation, human annotation
TL;DR: We outline a research agenda which we believe can hep address gaps and issues in the current, vastly growing literature on NLP for climate finance.
Abstract: We present a research agenda focused on efficiently extracting, assuring quality, and consolidating textual company sustainability information to address urgent climate change decision-making needs. Starting from the goal to create integrated FAIR (Findable, Accessible, Interoperable, Reusable) climate-related data, we identify research needs pertaining to the technical aspects of information extraction as well as to the design of the integrated sustainability datasets that we seek to compile. Regarding extraction, we leverage technological advancements, particularly in large language models (LLMs) and Retrieval-Augmented Generation (RAG) pipelines, to unlock the underutilized potential of unstructured textual information contained in corporate sustainability reports. In applying these techniques, we review key challenges, which include the retrieval and extraction of CO$_2$ emission values from PDF documents, especially from unstructured tables and graphs therein, and the validation of automatically extracted data through comparisons with human-annotated values. We also review how existing use cases and practices in climate risk analytics relate to choices of what textual information should be extracted and how it could be linked to existing structured data.
Archival Submission: arxival
Arxival Submission: arxival
Submission Number: 4
Loading