SNCD: A fast and scalable distributed near-miss code clone detector for big code based on partial index

Published: 01 Jan 2025, Last Modified: 13 May 2025Future Gener. Comput. Syst. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Highlights•Token-based partial indexing with distributed and multi-threading optimization.•Near-miss clone detection including Type-3 clones for large repositories.•Language agnostic with code granularity flexibility for function and file level.•Clone detection in hundreds million lines of code within 20 min or less.•Recall and precision comparable to state-of-the-art approaches.
Loading