Detecting Hallucination and Coverage Errors in Controlled Text Generation for Controversial Topics

Anonymous

Detecting Hallucination and Coverage Errors in Controlled Text Generation for Controversial Topics

Anonymous

17 Apr 2023 (modified: 18 Apr 2023)ACL ARR 2023 April Blind SubmissionReaders: Everyone

Abstract: We propose a new strategy to handle controversial topics in LLM-based chatbots based on Wikipedia's Neutral Point of View (NPOV) principle: acknowledge the absence of a single true answer and surface multiple perspectives. We frame this as controlled text generation, where perspectives are retrieved from a knowledge base and the LLM is tasked with generating a fluent and faithful response from the perspectives. Our main contribution is a detailed study of common failure modes of LLMs, namely hallucination and coverage errors, in the context of this controlled generation task. We propose and evaluate three methods to detect such errors based on (1) word-overlap, (2) salience, and (3) LLM-based classifiers. Our results demonstrate that classifiers, even when trained only on synthetic errors, can achieve high performance, with ROC AUC scores of 95.3% for hallucination and 90.5% for coverage error detection on unambiguous error cases. We show that when no training data is available, our other methods can still yield good results on hallucination (84.0%) and coverage error (85.2%) detection.

Paper Type: long

Research Area: Generation

0 Replies

Loading