SOSum: A Dataset of Stack Overflow Post SummariesDownload PDFOpen Website

Published: 01 Jan 2022, Last Modified: 21 Jun 2023MSR 2022Readers: Everyone
Abstract: Stack Overflow (SO) is becoming an indispensable part of modern software development workflow. However, given the limited time, attention, and memory capacity of programmers, navigating SO posts and comparing different solutions is time-consuming and cumbersome. Recent research has proposed to summarize SO posts to concise text to help programmers quickly assess the relevance and quality of SO posts. Yet there is no large dataset of high-quality SO post summaries, hindering the development and evaluation of post summarization techniques. We present SOSum, a dataset of 2,278 popular SO answer posts with manually labeled summative sentences. Questions in SOSum cover 669 tags with a median view count of 253K and a median post score of 17. This dataset will foster research on sentence-level summarization of SO posts and has the potential to facilitate text summarization research on other types of textual software artifacts such as programming tutorials.
0 Replies

Loading