An NLP Benchmark Dataset for Assessing Corporate Climate Policy Engagement

Published: 26 Sept 2023, Last Modified: 12 Jan 2024NeurIPS 2023 Datasets and Benchmarks SpotlightEveryoneRevisionsBibTeX
Keywords: natural language processing, corporate climate policy engagement, climatebert, greenwashing
Abstract: As societal awareness of climate change grows, corporate climate policy engagements are attracting attention. We propose a dataset to estimate corporate climate policy engagement from various PDF-formatted documents. Our dataset comes from LobbyMap (a platform operated by global think tank InfluenceMap) that provides engagement categories and stances on the documents. To convert the LobbyMap data into the structured dataset, we developed a pipeline using text extraction and OCR. Our contributions are: (i) Building an NLP dataset including 10K documents on corporate climate policy engagement. (ii) Analyzing the properties and challenges of the dataset. (iii) Providing experiments for the dataset using pre-trained language models. The results show that while Longformer outperforms baselines and other pre-trained models, there is still room for significant improvement. We hope our work begins to bridge research on NLP and climate change.
Submission Number: 161
Loading