CoMSum and SIBERT: A Dataset and Neural Model for Query-Based Multi-document Summarization

Sayali Kulkarni, Sheide Chammas, Wan Zhu, Fei Sha, Eugene Ie

2021 (modified: 10 Jul 2022)ICDAR (2) 2021Readers: Everyone

Abstract: Document summarization compress source document(s) into succinct and information-preserving text. A variant of this is query-based multi-document summarization (q mds) that targets summaries to providing specific informational needs, contextualized to the query. However, the progress in this is hindered by limited availability to large-scale datasets. In this work, we make two contributions. First, we propose an approach for automatically generated dataset for both extractive and abstractive summaries and release a version publicly. Second, we design a neural model SIBERT for extractive summarization that exploits the hierarchical nature of the input. It also infuses queries to extract query-specific summaries. We evaluate this model on CoMSum dataset showing significant improvement in performance. This should provide a baseline and enable using CoMSum for future research on q mds.

0 Replies