The Effect of Scaling, Retrieval Augmentation and Form on the Factual Consistency of Language Models

Lovisa Hagström; Denitsa Saynova; Tobias Norlund; Moa Johansson; Richard Johansson

The Effect of Scaling, Retrieval Augmentation and Form on the Factual Consistency of Language Models

Lovisa Hagström, Denitsa Saynova, Tobias Norlund, Moa Johansson, Richard Johansson

Published: 07 Oct 2023, Last Modified: 01 Dec 2023EMNLP 2023 MainEveryoneRevisionsBibTeX

Submission Type: Regular Long Paper

Submission Track: Language Modeling and Analysis of Language Models

Submission Track 2: Theme Track: Large Language Models and the Future of NLP

Keywords: consistency, evaluation, large language models, retrieval-augmentation, causal analysis

Abstract: Large Language Models (LLMs) make natural interfaces to factual knowledge, but their usefulness is limited by their tendency to deliver inconsistent answers to semantically equivalent questions. For example, a model might supply the answer "Edinburgh" to "Anne Redpath passed away in X." and "London" to "Anne Redpath's life ended in X.". In this work, we identify potential causes of inconsistency and evaluate the effectiveness of two mitigation strategies: up-scaling and augmenting the LM with a passage retrieval database. Our results on the LLaMA and Atlas models show that both strategies reduce inconsistency but that retrieval augmentation is considerably more efficient. We further consider and disentangle the consistency contributions of different components of Atlas. For all LMs evaluated we find that syntactical form and task artifacts impact consistency. Taken together, our results provide a better understanding of the factors affecting the factual consistency of language models.

Submission Number: 3498

Loading