Large Language Models Systematically Favor Popular Options: Evidence and Mitigation Across Multiple Choice Tasks
Keywords: large language models, multiple-choice question answering, popularity bias, debiasing
Abstract: Multiple–choice questions (MCQs) are widely used for benchmarking large language models (LLMs). We show that modern LLMs systematically favor popular distractors over less–popular correct options. We introduce PopMCQ, a strategy technique of six stress/control manipulations for MCQs that alter option popularity while keeping the gold label fixed. We apply these strategies to the PlausibleQA evaluation built from NQ, TriviaQA, MuSiQue, and QASC, and quantify bias via the Spearman rank correlation between correctness and \emph{relative} popularity surplus. We then introduce PopDebias, an inference–time correction that removes a label-free popularity prior and requires no LLM fine-tuning (with an optional lightweight calibration step). When averaged across all datasets and strategies, PopDebias improves the accuracy of all 23 models evaluated. This finding holds true at the individual dataset level as well, with the method boosting accuracy for at least 20 of 23 models on every dataset we tested (NQ: 23/23, QASC: 22/23, MuSiQue: 22/23, and TriviaQA: 20/23), demonstrating broad effectiveness.
Primary Area: generative models
Submission Number: 20290
Loading