ComSum: Commit Messages Summarization and Meaning PreservationDownload PDF

02 Jun 2021 (modified: 20 Oct 2024)Submitted to NeurIPS 2021 Datasets and Benchmarks Track (Round 1)Readers: Everyone
Keywords: Data set, dataset, summarization, abstractive, commits, subjects, developers, programming, comsum, cumsum, ComSum, natural language processing, data set
TL;DR: A commit dataset for summarization
Abstract: We present ComSum, a data set of 7 million commit messages for text summarization. When documenting commits, software code changes, both a message and its summary are posted. We gather and filter those to curate developers' work summarization data set. Along with its growing size, practicality and challenging language domain, the data set benefits from the living field of empirical software engineering. As commits follow a typology, we propose to not only evaluate outputs by Rouge, but by their meaning preservation.
URL: https://figshare.com/s/f338e93369138de0dea5
Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 1 code implementation](https://www.catalyzex.com/paper/comsum-commit-messages-summarization-and/code)
9 Replies

Loading