Code clustering workbench

K. M. Annervaz, Vikrant S. Kaulgud, Janardan Misra, Shubhashis Sengupta, Gary Titus, Azmat Munshi

Published: 2013, Last Modified: 21 May 2025SCAM 2013EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Source code clustering is an important technique used in software development and maintenance to understand the modular structure of code. An array of algorithms are available for clustering like simulated annealing based search. Source code have different kinds of features such as structural or textual features. The collection of these different types of source code features and computation of relevant feature metrics is a difficult task. Further, the clustering algorithms can run on metrics based on different types of source code features or their combinations. This flexibility makes it non-trivial to test effectiveness of clustering algorithms on a source code. In this paper, we present a highly configurable clustering workbench that allows the user to collect the various source code features and then to select the code features used for clustering, the clustering algorithm and its various parameters. Clustering quality metrics are computed. They allow comparison of algorithm output based on different combinations of code-features and algorithms. We also present the specific contribution made in multi-dimensional feature analysis and clustering. The tool hides the algorithm complexity from the user, thus allowing complete focus on understanding the 'effect' of the configuration choices. We have also applied this tool in real-life maintenance projects, where the users found it useful to tweak the clustering techniques for the source-code peculiarities.