Abstract: Knowledge Editing---Efficiently modifying the knowledge in large language models
has gathered a great attention. Current benchmarks primarily use multi-hop question
answering to assess and analyze the newly injected/updated knowledge. We argue,
these benchmarks fall short of evaluating how effectively the updated model applies
this knowledge in real-life scenarios encompassing questions requiring complex
reasoning involving one-to-many relations and/or require multi-step logical
intersections (explained in detailed in Section 1). To address this gap,
we introduce a new benchmark, CompKE: Compex Question Answering under Knowledge Editing, encompassing
11,921 complex questions conforming to real-life scenarios. In addition, we also propose GDecom-CQA: Generic Decomposition
based Complex Question Answering,
a novel approach tailored at complex question answering. We performed comprehensive
evaluation of the GDecom-CQA using CompKE along with existing benchmarks to
showcase the effectiveness of key contributions made in this work.
Experimental evaluation reveals for GDecom-CQA outperforms the best-performing
baseline models on CompKE by improving the Augmented-Accuracy metric by 38.5\% on average.
Paper Type: Long
Research Area: Question Answering
Research Area Keywords: multihop QA
Contribution Types: Data resources
Languages Studied: English
Submission Number: 264
Loading