Beyond Generation: Leveraging LLM Creativity to Overcome Label Bias in Classification

Beyond Generation: Leveraging LLM Creativity to Overcome Label Bias in Classification

ACL ARR 2025 February Submission5794 Authors

16 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Large Language Models (LLMs) exhibit impressive capabilities in In-Context Learning (ICL) but are prone to label bias—an undesirable tendency to favor certain answers. Existing calibration methods mitigate bias by leveraging in-domain data, yet such data is often unavailable in real-world scenarios. To address this limitation, we propose SDC (Synthetic Data Calibration), a simple-yet-effective approach that generates synthetic in-domain data from a few in-context demonstrations and utilizes it for calibration. By approximating the benefits of real in-domain data, SDC effectively reduces label bias without requiring access to actual domain-specific inputs. Experimental evaluations on 279 classification and multiple-choice tasks from the Super-NaturalInstructions benchmark. The results show that SDC significantly reduces label bias, achieving an average Bias Score reduction of 57.5%, and outperforming all competitive baselines. Moreover, when combined with Leave-One-Out Calibration (LOOC), \model further improves performance, underscoring its effectiveness and generalizability in enhancing the reliability of LLMs.

Paper Type: Short

Research Area: Interpretability and Analysis of Models for NLP

Research Area Keywords: Label Bias, Synthetic Data

Contribution Types: Model analysis & interpretability, NLP engineering experiment

Languages Studied: English

Submission Number: 5794

Loading