Multilingual Investigation of Cross-Project Code Clones in Open-Source Software for Internet of Things Systems
Abstract: The prevalence and impact of code clones in software systems have been widely studied in the past few decades. However, the focus has primarily been on intra-project clones. Our work comprehensively investigates cross-project code clones in open-source software for Internet of Things (IoT) systems across multiple programming languages. This work addresses the prevalence of cross-project code clones in IoT systems and their impact on software maintainability. We collected 122 IoT system repositories in nine languages from GitHub and grouped them according to their primary functionality in IoT systems. We used MSCCD, a multilingual code clone detector to detect Type-3 code clones for each group. The results show that cross-project clones exist in more than 30% of the projects, particularly in communication-related functionalities. We tracked the historical evolution of these clones and classified them according to the revision history and changing trend of similarity. The results show that 95% cross-project clones are untouched. Moreover, clones with decreasing similarities were over 72%. Therefore, the same clone detector may no longer detect these clones. We also investigated whether these cross-project code clones lead to defect propagation by analyzing the commit message to determine the commits that fixed a defect. We identified nine defect propagation instances, of which seven remain unfixed. Our work contributes to understanding cross-project code clones, highlighting the importance of automated clone management tools for improving the quality and security of IoT system software to mitigate the risks associated with unresolved defects and inconsistencies in IoT software development.
Loading