Abstract: Multimodal summarization aims to generate a concise summary based on the input text and image. However, the existing methods always suffer from unfaithful output. To evaluate the factuality of multimodal summarization models, we propose two fine-grained and explainable evaluation frameworks for different application scenarios, i.e. reference-based factuality evaluation framework and reference-free factuality evaluation framework. Notably, the reference-free factuality evaluation framework doesn't need ground truth and hence it has a wider application scenario. To evaluate the effectiveness of the proposed frameworks, we compute the correlation between our frameworks and the other metrics. The experimental results show the effectiveness of our proposed method.
Paper Type: short
Research Area: Multimodality and Language Grounding to Vision, Robotics and Beyond
Contribution Types: NLP engineering experiment
Languages Studied: English, Chinese
0 Replies
Loading