Keywords: Consistency, large language model, interpretability
Abstract: Interpreting reasoning methods that operate within the context of LLMs, such as chain-of-thought, crucially depends on the transparency of whether the model actually follows the intended reasoning process.
We focus on whether the beliefs held by a model remain consistent before and after the extension of the context.
Previous research on consistency evaluation typically uses data with correct answers, which is problematic as the inability to arrive at the correct answer in the first place makes it unsuitable for assessing consistency. Furthermore, evaluating cases where inconsistency stems from multiple errors poses difficulties.
We propose a new evaluation method to assess the consistency of LLMs in a multiple-choice question answering format, designed so that any option chosen is correct, allowing for the evaluation of the proposed belief consistency. It also supports isolation of errors such as reasoning failures and biases.
We reveal that the belief consistency does not improve by model size scaling alone, whereas continual pre-training on coding and mathematics text improves it.
Furthermore, models trained on code and mathematics text show a seemingly contradictory result of increased logical failures, indicating that belief consistency and superficial consistency are not necessarily directly linked.
Paper Type: Long
Research Area: Special Theme (conference specific)
Research Area Keywords: Explainability of NLP Models,Ethics, Bias, and Fairness,Generation,Interpretability and Analysis of Models for NLP,Language Modeling,Question Answering,Resources and Evaluation
Contribution Types: Model analysis & interpretability, NLP engineering experiment
Languages Studied: English
Submission Number: 9471
Loading