AIMS.au: A Dataset for the Analysis of Modern Slavery Countermeasures in Corporate Statements

Adriana Eufrosina Bora; Pierre-Luc St-Charles; Mirko Bronzi; Arsene Fansi Tchango; Bruno Rousseau; Kerrie Mengersen

AIMS.au: A Dataset for the Analysis of Modern Slavery Countermeasures in Corporate Statements

Adriana Eufrosina Bora, Pierre-Luc St-Charles, Mirko Bronzi, Arsene Fansi Tchango, Bruno Rousseau, Kerrie Mengersen

Published: 22 Jan 2025, Last Modified: 30 Mar 2025ICLR 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: natural language processing, modern slavery, corporate statements, benchmark, text extraction, large language models

TL;DR: We provide a new dataset of modern slavery statements annotated at the sentence level to train and evaluate language models for relevant text detection and extraction.

Abstract: Despite over a decade of legislative efforts to address modern slavery in the supply chains of large corporations, the effectiveness of government oversight remains hampered by the challenge of scrutinizing thousands of statements annually. While Large Language Models (LLMs) can be considered a well established solution for the automatic analysis and summarization of documents, recognizing concrete modern slavery countermeasures taken by companies and differentiating those from vague claims remains a challenging task. To help evaluate and fine-tune LLMs for the assessment of corporate statements, we introduce a dataset composed of 5,731 modern slavery statements taken from the Australian Modern Slavery Register and annotated at the sentence level. This paper details the construction steps for the dataset that include the careful design of annotation specifications, the selection and preprocessing of statements, and the creation of high-quality annotation subsets for effective model evaluations. To demonstrate our dataset's utility, we propose a machine learning methodology for the detection of sentences relevant to mandatory reporting requirements set by the Australian Modern Slavery Act. We then follow this methodology to benchmark modern language models under zero-shot and supervised learning settings.

Supplementary Material: pdf

Primary Area: datasets and benchmarks

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 7235

Loading