Abstract: Multiple choice question (MCQ) is a common task for evaluating large language models (LLMs). LLMs' performance on MCQ is often affected by various biases. Previous research has extensively examined the impact of inherent option bias on MCQ predictions, where this bias refers to a preference for a specific option ID token introduced during the model's training. However, in an in-context learning scenario, few-shot prompting can also introduce a form of bias, known as context option bias. For example, when all demonstration answers are consistently option A, LLMs may predict A regardless of the question. Context option bias can significantly degrade LLMs' performance. To observe LLMs' behavior under context option bias, we use demonstrations with obvious bias to amplify the effect. The results indicate that certain attention heads in LLMs are particularly sensitive to context option bias. Motivated by this observation, we propose our approach, CoLo, to address this issue. CoLo first compares outputs from ordinary and biased demonstrations and localizes attention heads sensitive to context option bias through sequential interventions. Then, we propose an attention scaling-based method to intervene in the identified attention heads during the inference stage, thereby mitigating the impact of context option bias on the LLMs’ predictions. Experimental results show that \ours alleviates context option bias and improves LLMs' robustness on MCQ tasks.
Paper Type: Long
Research Area: Interpretability and Analysis of Models for NLP
Research Area Keywords: model bias mitigation, knowledge tracing; probing, robustness
Contribution Types: Model analysis & interpretability
Languages Studied: English
Submission Number: 692
Loading