Evaluating the Robustness of LLMs against Label Variants MCQs in Logits Space

ACL ARR 2025 February Submission7278 Authors

16 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: The widespread use of Large Language Models (LLMs) has made the robustness emerge as a critical metric in LLM evaluation. Multiple-choice questions(MCQs) constitute a significant form of LLM evaluation, which underscores the importance of studying the robustness of LLMs to MCQs. While there has been considerable research on the robustness of LLMs, the majority of these studies have been conducted as black-box assessments in the textual space. In this paper, we further evaluated the robustness of LLMs to label variants in the logits space. Our experiments on 3 datasets and 10 models show that LLMs exhibit a significant selection bias towards different choice token sets, meaning that variant choice options can alter the model's confidence in answering questions. In particular, the smaller size models exhibit more pronounced selection bias. We also found that post-training can significantly enhance model robustness to label variants after comparing the base version and the instruct version of different LLMs. The results demonstrate that the evaluation in the logits space can tell us more about LLMs.
Paper Type: Long
Research Area: Interpretability and Analysis of Models for NLP
Research Area Keywords: robustness, logits space, LLM, MCQs
Contribution Types: Model analysis & interpretability
Languages Studied: English, Chinese
Submission Number: 7278
Loading