Towards summarizing program statements in source code search

Victor J. Marin, Iti Bansal, Carlos R. Rivero

2020 (modified: 24 Dec 2022)SAC 2020Readers: Everyone

Abstract: A common practice among programmers is to find pieces of source code using search engines. Programs retrieved by these engines are typically semantically but not necessarily syntactically similar. As a result, ranking methods are exploited to present relevant programs to users. However, due to implementation variability, users need to understand such programs. In this paper, we propose a method to group statements into clusters from a set of programs retrieved by a source code search engine. Each cluster comprises a number of program statements that have similar but not exact semantics and are pervasive. Our hypothesis is that such clusters help understand at a glance a set of semantically-related programs. We use approximate graph alignment to find correspondences among statements in two program dependence graphs that are similar with respect to their control and data flows, as well as operations they perform. We then build a graph with pairwise comparisons of program dependence graphs, and cast the problem of clustering statements as finding communities of statements that consistently align. Our evaluation using programs collected by BigCloneBench shows that clusters of statements discovered by our approach help discern implementation variations.

0 Replies