MULTIMODAL FAITHSCORE: Evaluating Hallucinations in Large Vision-Language ModelsDownload PDF

Anonymous

16 Oct 2023ACL ARR 2023 October Blind SubmissionReaders: Everyone
Abstract: We introduce MULTIMODAL FAITHSCORE (Faithfulness in Atomic Fact Score), a fine-grained evaluation metric that measures the faithfulness of generated responses to an input image and corresponding open-ended questions. MULTIMODAL FAITHSCORE first identifies sub-sentences that should be verified, then extracts nuanced elements from identified sub-sentences, and finally conducts consistency verification between elements and the input image. Meta-evaluation demonstrates that our reference-free metric highly correlates with human judgments of faithfulness. We measure hallucinations in state-of-the-art LLVMs with MULTIMODAL FAITHSCORE. Based on both automatic evaluation and human judgments, we show that current systems mostly face challenges such as unfaithful generations that are not grounded to the image, which leaves room for improvements in performance.
Paper Type: long
Research Area: Multimodality and Language Grounding to Vision, Robotics and Beyond
Contribution Types: Model analysis & interpretability, NLP engineering experiment, Data resources, Data analysis
Languages Studied: English
Consent To Share Submission Details: On behalf of all authors, we agree to the terms above to share our submission details.
0 Replies

Loading