Contrastive Learning of Natural Language and Code Representations for Semantic Code SearchDownload PDF

Anonymous

16 Jun 2021 (modified: 05 May 2023)ACL ARR 2021 Jun Blind SubmissionReaders: Everyone
Abstract: Retrieving semantically relevant code functions given a natural language (NL) or programming language (PL) query is a task of great practical value towards building productivity enhancing tools for software developers. Recent approaches to solve this task involve leveraging transformer based masked language models that are pre-trained on NL and PL and fine-tuned for code search using a contrastive learning objective. However, these approaches suffer from uninformative in-batch negative samples. We propose DyHardCode: a contrastive learning framework that leverages hard negative examples, which are mined globally from the entire training corpus to improve the quality of code and natural language representations. We experiment with different hard negative mining strategies, and provide explanations to the effectiveness of our method from the perspectives of optimization and adversarial learning. We show that DyHardCode leads to improvements in multiple code search tasks. Our approach achieves an average (across 6 programming languages) mean reciprocal ranking (MRR) score of $0.750$ as opposed to the previous state of the art result of $0.713$ MRR on the CodeSearchNet benchmark.
0 Replies

Loading