TL;DR: This paper introduces a novel framework inspired by Inference to the Best Explanation (IBE) to evaluate LLM explanatory reasoning, considering linguistic and logical criteria like consistency, parsimony, coherence, and uncertainty.
Abstract: How do Large Language Models (LLMs) generate explanations? While LLMs are increasingly adopted in real-world applications, the principles and properties behind their explanatory process are still poorly understood. This paper proposes an interpretability and evaluation framework for LLMs' explanatory reasoning inspired by philosophical accounts on Inference to the Best Explanation (IBE). In particular, the framework aims to estimate the quality of natural language explanations through a combination of criteria computed on linguistic and logical features, including consistency, parsimony, coherence, and uncertainty. We conduct extensive experiments on Causal Question Answering (CQA), instantiating our framework to select among competing explanations generated by LLMs (i.e., ChatGPT and LLama 2). The results reveal that the proposed methodology can successfully identify the best explanation supporting the correct answers with up to 77% accuracy ($\approx 27\%$ above random) suggesting that LLMs indeed conform to features of IBE. At the same time, we found notable differences across LLMs, with ChatGPT significantly outperforming LLama 2. Finally, we analyze the degree to which different criteria can predict the correct answer, suggesting potential implications for external verification methods for LLM-generated output.
Paper Type: long
Research Area: Semantics: Sentence-level Semantics, Textual Inference and Other areas
Contribution Types: Model analysis & interpretability, NLP engineering experiment
Languages Studied: english
0 Replies
Loading