Evaluating few-shot and contrastive learning methods for code clone detection

Mohamad Khajezade, Fatemeh H. Fard, Mohamed S. Shehata

Published: 2024, Last Modified: 06 Mar 2025Empir. Softw. Eng. 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Code Clone Detection (CCD) is a software engineering task that is used for plagiarism detection, code search, and code comprehension. Recently, deep learning-based models have achieved an F1-Score (a metric used to assess classifiers) of \(\sim \)95% on the CodeXGLUE benchmark. These models require many training data, mainly fine-tuned on Java or C++ datasets. However, no previous study evaluates the generalizability of these models where a limited amount of annotated data is available.