Quantifying and Mitigating Selection Bias in LLMs: A Transferable LoRA Fine-Tuning and Efficient Majority Voting Approach

ACL ARR 2025 May Submission6601 Authors

20 May 2025 (modified: 03 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Multiple-choice question answering (MCQA) is a widely-used method for evaluating the performance of Large Language Models (LLMs). However, LLMs often exhibit selection bias in MCQA tasks, where their choices are influenced by factors like answer position or option symbols rather than the content. This bias undermines the reliability of MCQA as an evaluation framework. Most existing selection bias metrics require answer labels and measure divergences between prediction and answer distributions, but do not fully capture the consistency of a model’s predictions across different orderings of answer choices. Existing selection bias mitigation strategies have notable limitations: majority voting, though effective, is computationally prohibitive; calibration-based methods require validation sets and often fail to generalize across datasets. To address these gaps, we propose three key contributions: (1) a new unsupervised label-free Permutation Bias Metric (PBM) that directly quantifies inconsistencies in model predictions across answer permutations, providing a more precise measure of selection bias, (2) an efficient majority voting approach called Batch Question-Context KV caching (BaQCKV), to significantly reduce computational costs while preserving bias mitigation effectiveness, and (3) an unsupervised Low-Rank Adaptation (LoRA)-1 fine-tuning strategy based on our proposed metric and the BaQCKV that mitigates selection bias, providing a computationally efficient alternative that maintains model generalizability. Experiments across multiple MCQA benchmarks demonstrate that our approaches reduce bias, increasing consistency in accuracy while minimizing computational costs.
Paper Type: Long
Research Area: Ethics, Bias, and Fairness
Research Area Keywords: bias evaluation, debiasing methods, multiple-choice QA, permutation invariance, label-free methods, efficient inference
Contribution Types: Model analysis & interpretability, Approaches to low-resource settings, Approaches low compute settings-efficiency, Publicly available software and/or pre-trained models
Languages Studied: English
Keywords: Ethics, Bias, and Fairness
Submission Number: 6601
Loading