Language Models' Internal Conflicts: Layer-wise Usable Information For Detecting Model (Un)answerability

Hazel Kim; Philip Torr; Adel Bibi

Language Models' Internal Conflicts: Layer-wise Usable Information For Detecting Model (Un)answerability

Hazel Kim, Philip Torr, Adel Bibi

24 Sept 2024 (modified: 12 Oct 2024)ICLR 2025 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Uncertainty, Question Answering

Abstract: We propose a new framework for detecting unanswerable questions. Large language models often sound overly convincing when providing inaccurate answers. We explore how language models behave when required to answer questions without relevant information in the provided context, a situation that is likely to result in hallucination. We state that unanswerable questions represent a deficiency of $\mathcal{V}$-usable information across the layers of a pre-trained language model $\mathcal{V}$. To capture this, we propose a layer-wise usable information ($\mathcal{L}$I), tracking how much information is usable by language models within their mechanism of updating layers. We empirically argue that information is not always monotonically gained or lost across layers, and hence tracking all the layers within the language models is far more informative than embracing the final layer as the complete form of computation. Our method does not require label annotations to fine-tune classifiers or modify model architectures. Hence it is computationally feasible to universal large language models.

Primary Area: generative models

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 3650

Loading