Abstract: Timeline summarization (TLS) involves creating summaries of long-running events by amalgamating dated summaries from multiple news articles. However, the scarcity of available data has considerably hindered the advancement of timeline summarization. In this paper, we introduce the CNTLS dataset, an open resource for Chinese timeline summarization. CNTLS comprises 77 real-life topics, each containing 2524 documents, and achieves an average compression of nearly 60\% of the duration of all topics.We meticulously analyze the corpus using established metrics, focusing on the style of the summaries and the complexity of the summarization task. We rigorously assess the performance of various classic extraction TLS systems and substantiate the applicability of the large model approach for generative TLS systems on the CNTLS corpus, thereby furnishing benchmarks and fostering further research. To the best of our knowledge, CNTLS marks the inception of the first Chinese timeline summarization dataset. The dataset and source code are released~\footnote{Code and data available at: \emph{{Accompanied ARR submission}}.}.
Paper Type: short
Research Area: Summarization
Contribution Types: Data resources
Languages Studied: Chinese, English
0 Replies
Loading