A New Search Paradigm for Natural Language Code SearchDownload PDF

Anonymous

16 Nov 2021 (modified: 05 May 2023)ACL ARR 2021 November Blind SubmissionReaders: Everyone
Abstract: Code search can accelerate the efficiency of software development by finding code snippets for the given query. The dominant code search paradigm is to learn the semantic matching between code snippets and queries by neural networks. However, this search paradigm causes the gap transferring and expansion between code snippets and queries because researchers utilize pairs of code snippets and code descriptions (e.g., comments and documentation) to train their models and evaluate the trained models on the query which is different from the code description in writing style and application scenario. To remedy the issue, we propose a new simple but effective search paradigm, Query2Desc, which entirely depends on natural language and conducts code search by performing the semantic matching between code descriptions and queries. Experimental results on dataset CoSQA show that the state-of-the-art model CodeBERT gets improvement of 17.48\% in terms of the average MRR when applying it on Query2Desc. Moreover, baseline models on Query2Desc can return the right results in top-$10$ search results for at least 95\% of queries in the test set of CoSQA.
0 Replies

Loading