How Credible Is an Answer From Retrieval-Augmented LLMs? Investigation and Improvement With Multi-Hop QA

Anonymous

How Credible Is an Answer From Retrieval-Augmented LLMs? Investigation and Improvement With Multi-Hop QA

Anonymous

16 Feb 2024ACL ARR 2024 February Blind SubmissionReaders: Everyone

Abstract: Retrieval-augmented Large Language Models (RaLLMs) are reshaping knowledge acquisition, offering long-form, knowledge-grounded answers through advanced reasoning and generation capabilities. Despite the emergence of impactful systems like WebGPT and New Bing, the reliability of RaLLMs, especially in complex situations, is under scrutiny. Our study tackles this concern by evaluating RaLLMs' question-answering performance using a novel benchmark focusing on Correctness and Groundedness. Correctness measures the logical soundness of the responses, and Groundedness checks for support by relevant references. We introduce an automated model-based evaluation pipeline for multi-hop question-answering tasks, revealing RaLLMs' proneness to generating inaccuracies when dealing with flawed or partial knowledge. To improve accuracy, we introduce two reasoning strategies, Self-Reflection' and Self-Completion,' enabling RaLLMs to identify and fill knowledge gaps, significantly improving answer quality without extensive model retraining.

Paper Type: long

Research Area: Question Answering

Contribution Types: Model analysis & interpretability, NLP engineering experiment

Languages Studied: English

0 Replies

Loading