Multi-Narrative Semantic Intersection Task: Evaluation and Benchmark

Anonymous

Multi-Narrative Semantic Intersection Task: Evaluation and Benchmark

Anonymous

17 Sept 2021 (modified: 05 May 2023)ACL ARR 2021 September Blind SubmissionReaders: Everyone

Abstract: In this paper, we introduce an important yet relatively unexplored NLP task called Multi-Narrative Semantic Intersection (MNSI), which entails generating a Semantic Intersection of multiple alternate narratives. As no benchmark dataset is readily available for this task, we created one by crawling 2,925 alternative narrative pairs from the web and then, went through the tedious process of manually creating 411 different ground-truth semantic intersections by engaging human annotators. As a way to evaluate this novel task, we first conducted a systematic study by borrowing the popular ROUGE metric from text summarization literature and discovered that ROUGE is not suitable for our task. Subsequently, we conducted further human annotations/validations to create 200 document-level and 1,518 sentence-level ground-truth labels which helped us formulate a new precision-recall style evaluation metric, called SEM F1 (semantic F1), based on presence, partial presence and absence of information. Experimental results show that the proposed SEM F1 metric yields higher correlation with human judgement as well as higher inter-rater agreement compared to ROUGE metric and thus, we recommend the community to use this metric for evaluating future research on this topic.

0 Replies

Loading