INTERSECTIONRE: Mitigating Intersectional Bias in Relation Extraction Through Coverage-Driven Augmentation

ACL ARR 2025 February Submission4405 Authors

15 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Relation Extraction (RE) models are crucial to many Natural Language Processing (NLP) applications but often inherit and deepen biases in their training data. The underrepresentation of certain demographic groups can result in performance disparities, particularly when considering intersectional fairness, where biases intersect across attributes such as gender and ancestry. To address this issue, we present IntersectionRE, a framework to improve the representation of underrepresented groups by generating synthetic training data. IntersectionRE identifies gaps in demographic coverage and optimizes data generation, ensuring the quality of augmented data through Large Language Models (LLMs), perplexity scoring, and factual consistency validation. Experimental results on the NYT-10 dataset demonstrate that our approach effectively reduces intersectional disparities and enhances F1 scores, particularly for historically underrepresented groups.
Paper Type: Long
Research Area: Ethics, Bias, and Fairness
Research Area Keywords: Representation Bias - Synthetic Data Generation - Relation Extraction
Contribution Types: Model analysis & interpretability, Data resources
Languages Studied: English
Submission Number: 4405
Loading