Investigating the Generalizability of Deep Learning-based Clone Detectors

Eunjong Choi, Norihiro Fuke, Yuji Fujiwara, Norihiro Yoshida, Katsuro Inoue

Published: 2023, Last Modified: 20 Jul 2025ICPC 2023EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: The generalizability of Deep Learning (DL) models is a significant challenge, as poor generalizability indicates that the model has overfitted to the training data and is not able to generalize to new data. Despite numerous DL-based clone detectors emerging in recent years, their generalizability has not been thoroughly assessed. This study investigates the generalizability of three DL-based clone detectors (CCLearner, ASTNN, and CodeBERT) by comparing their detection accuracy on different training and testing clone benchmarks. The results show that all three clone detectors do not generalize well to new data and there is a strong relationship between clone types and generalizability for CCLearner and ASTNN.