Towards Consistent Natural-Language Explanations via Explanation-Consistency Finetuning

Anonymous

Towards Consistent Natural-Language Explanations via Explanation-Consistency Finetuning

Anonymous

16 Feb 2024ACL ARR 2024 February Blind SubmissionReaders: Everyone

Abstract: Large language models (LLMs) often generate convincing, fluent explanations. However, different from humans, they often generate $\textit{inconsistent}$ explanations on different inputs. For example, an LLM may generate the explanation "$\textit{all birds can fly}$" when answering the question "$\textit{Can sparrows fly?}$" but meanwhile answer "$\textit{no}$" to the related question "$\textit{Can penguins fly?}$". Explanations should be consistent across related examples so that they allow a human to simulate the LLM's decision process on multiple examples.We propose $\textbf{explanation-consistency finetuning}$ (EC-finetuning), a method that adapts LLMs to generate more consistent natural-language explanations on related examples. EC-finetuning involves finetuning LLMs on synthetic data that is carefully constructed to contain consistent explanations. Across a variety of question-answering datasets in various domains, EC-finetuning yields a $\textbf{10.0\%}$ relative explanation consistency improvement on four finetuning datasets, and generalizes to seven out-of-distribution datasets not seen during finetuning ($\textbf{+4.5\%}$ relative). We will make our code available for reproducibility.

Paper Type: short

Research Area: Interpretability and Analysis of Models for NLP

Contribution Types: Model analysis & interpretability, NLP engineering experiment

Languages Studied: English

0 Replies

Loading