Source Code Summarization Using Attention-Based Keyword Memory Networks

YunSeok Choi, Suah Kim, Jee-Hyong Lee

2020 (modified: 19 Dec 2022)BigComp 2020Readers: Everyone

Abstract: Recently, deep learning techniques have been developed for source code summarization. Most existing studies have simply adopted natural language processing techniques, because source code summarization can be considered as machine translation tasks from source code into descriptions. However, source code and its description are very different, not only in the languages of writing, but also in the purpose of writing. There is a large semantic gap between source codes in programming languages and their descriptions in natural languages. To respond to the semantic gap, we propose a two-phase model that consists of a keyword predictor and a description generator. The keyword predictor captures the natural language keywords semantically associated with the source code, and the generator generates a description by referring to the natural language keywords provided by the predictor. Using such keywords as scaffolding, we can effectively reduce the semantic gap and generate more accurate descriptions of source codes. To evaluate the proposed method, we use datasets collected from GitHub and StackOverflow. We perform various experiments with these datasets. Our methods show outstanding performance compared with baselines that include state-of-the-art methods, which concludes that keyword prediction is very helpful to the generation of accurate descriptions.

0 Replies