Abstract: We introduce \textsc{CivilSum}, a dataset of 23,350 legal case decisions paired with human-written abstractive summaries from the Supreme Court of India and Indian High Courts. In contrast to other domains such as news articles, our analysis shows the most important content tends to appear at the end of the documents. We measure the effect of this \emph{tail bias} on summarization performance using strong baselines for long-document abstractive summarization, and the results highlight the importance of long sequence modeling for the proposed task. \textsc{CivilSum} and related code are publicly available for research purposes.
Paper Type: short
Research Area: Resources and Evaluation
Contribution Types: Data resources
Languages Studied: English
Consent To Share Submission Details: On behalf of all authors, we agree to the terms above to share our submission details.
0 Replies
Loading