SCodeSearcher: soft contrastive learning for code search

Jia Li, Zheng Fang, Xianjie Shi, Zhi Jin, Fang Liu, Jia Li, Yunfei Zhao, Ge Li

Published: 01 Jan 2025, Last Modified: 19 May 2025Empir. Softw. Eng. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Code search has been a critical software development activity in facilitating the efficiency of developers. It retrieves programs to satisfy the user intent from a codebase. Recently, many researchers have applied contrastive learning to learn the semantic relationships between queries and code snippets, resulting in impressive performance in code search. Though achieving improvements, these models ignore the following challenging scenarios in code search. First, a good code search tool should be able to retrieve all code snippets from a candidate pool that meet the given query and are implemented in diverse manners, thus the retrieved programs can satisfy different programming styles of developers. Second, in the open-source community, some programs have similar implementations but provide different functions. Code search engines need to distinguish desired programs from these confusing code snippets that have similar implementations but can not meet the query. To address these limitations, we propose a soft contrastive learning method SCodeSearcher for code search, which highlights challenging examples by arranging high weights to them based on their challenging degrees in the contrastive learning objective. We conduct extensive experiments on five representative code search datasets including code retrieval and code question answering tasks. The experimental results show that SCodeSearcher only trained on a much smaller (less than one-tenth) corpus can achieve comparable performances to existing methods optimized on the large-scale dataset, significantly saving computing resources and training time.