Neural joint attention code search over structure embeddings for software Q&A sites

Gang Hu, Min Peng, Yihan Zhang, Qianqian Xie, Mengting Yuan

2020 (modified: 16 Nov 2021)J. Syst. Softw. 2020Readers: Everyone

Abstract: Highlights • Propose a joint attention framework for code search in software Q&A sites. • Introduce structure embeddings to enhance searching for code fragments. • Achieve encouraging performance on four different programming languages. • Construct high-quality evaluation corpus for code search in software Q&A sites. Abstract Code search is frequently needed in software Q&A sites for software development. Over the years, various code search engines and techniques have been explored to support user query. Early approaches often utilize text retrieval models to match textual code fragments for natural query, but fail to build sufficient semantic correlations. Some recent advanced neural methods focus on restructuring bi-modal networks to measure the semantic similarity. However, they ignore potential structure information of source codes and the joint attention information from natural queries. In addition, they mostly focus on specific code structures, rather than general code fragments in software Q&A sites. In this paper, we propose NJACS, a novel two-way attention-based neural network for retrieving code fragments in software Q&A sites, which aligns and focuses the more structure informative parts of source codes to natural query. Instead of directly learning bi-modal unified vector representations, NJACS first embeds the queries and codes using a bidirectional LSTM with pre-trained structure embeddings separately, then learns an aligned joint attention matrix for query-code mappings, and finally derives the pooling-based projection vectors in different directions to guide the attention-based representations. On different benchmark search codebase collected from StackOverflow, NJACS outperforms state-of-art baselines with 7.5% to 6% higher [email protected] and MRR, respectively. Moreover, our designed structure embeddings can be leveraged for other deep-learning-based software tasks.

0 Replies