Abstract: In recent years, code search techniques on software Q&A sites have become increasingly attractive due to the need for software development. Most of the existing work treats code snippets as text fragments, ignoring the effect of the structured information (i.e. sequential information) of the code. Meanwhile, much of the existing work does not take into account the interactive between code snippets and queries. In this paper, we propose a novel deep neural network named HECS1 (Hierarchical embedding for code search) to solve the problems mentioned above. Our method divides the embedding process of code and query into two hierarchies, that is, the potential information is captured by two modules (the Intra-language encoding module and the Cross-language encoding module). In particular, our approach uses special LSTM (Long Short-Term Memory) variants, which is ON-LSTM (ordered neurons LSTM) to capture the keyword order structure of the code. The Intra-language encoding module is implemented by the LSTM variant and the Cross-language encoding module is an interactive information calculation module implemented by the attention mechanism. In this way, the similarity between the query and the corresponding code snippets in the vector space could be better captured. HECS can understand the difference between positive and negative samples more accurately. We empirically evaluate HECS, using a large scale codebase collected from StackOverflow. The experimental results show that our approach achieves state-of-the-art performance.
Loading