Do LLMs Believe in Themselves? A Benchmark for LLM Robustness against External Counterfactual Knowledge

Do LLMs Believe in Themselves? A Benchmark for LLM Robustness against External Counterfactual Knowledge

ACL ARR 2024 June Submission3606 Authors

16 Jun 2024 (modified: 02 Aug 2024)ACL ARR 2024 June SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Large language Models (LLMs) and AI chatbots have improved people's efficiency in various fields and shown strong capacities in many NLP tasks. However, when extrinsic knowledge contains misinformation from careless mistakes or malicious web texts that users do not realize, there is a higher probability for models to trust wrong external information and generate inaccurate answers that will mislead users. Therefore, we design two principles for models' behaviors in such cases and create a benchmark with counterfactual information in the contexts from existing knowledge bases for further evaluation. We also propose two new metrics to measure the extent to which this misinformation misleads models. Evaluation results show that existing LLMs are susceptible to interference from unreliable external knowledge with counterfactual information, and simple intervention methods make limited contributions to the alleviation of this issue.

Paper Type: Long

Research Area: Resources and Evaluation

Research Area Keywords: benchmarking, automatic creation and evaluation of language resources, metrics

Contribution Types: Model analysis & interpretability, Data resources

Languages Studied: English

Submission Number: 3606

Loading