Abstract: Attribution is the process of identifying which parts of the source support a generated output.
While attribution can help users verify content and assess faithfulness, existing task definitions typically exclude unsupported or hallucinated content leaving them unattributed, overlooking the potential to increase faithfulness certainty, locate the error, and fix it easier.
In this paper, we propose a new definition for sentence-level error-tolerant attribution, which extends attribution to include incorrect or hallucinated content. We introduce a benchmark for this task and evaluate a range of models on it. Our results show that sentence-level error-tolerant attribution improves the quality of both automatic and manual faithfulness evaluations, reducing annotation time by 30% in long-document settings, and facilitates hallucination fixing. We also find that unfaithful outputs are often linked to sentences that appear later in the source or contain non-literal language, pointing to promising avenues for hallucination mitigation. Our approach offers a better user experience along with improved faithfulness evaluation, with better understanding of model behavior.
Paper Type: Long
Research Area: Summarization
Research Area Keywords: attribution, faithfulness, incorrect, hallucination
Contribution Types: NLP engineering experiment
Languages Studied: English
Submission Number: 5936
Loading