LenANet: A Length-Controllable Attention Network for Source Code Summarization

Published: 2023, Last Modified: 23 Jan 2026ICONIP (11) 2023EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Source code summarization aims at generating brief description of a source code. Existing approaches have made great breakthroughs through encoder-decoder models. They focus on learning common features contained in translation from source code to natural language summaries. As a result, they tend to generate generic summaries independent of the context and lack of details. However, specific summaries which characterize specific features of code snippets are widely present in real-world scenarios. Such summaries are rarely studied as capturing specific features of source code would be difficult. What’s more, only the common features learned would result in only the generic short summaries generated. In this paper, we present LenANet to generate specific summaries by considering the desired length information and extracting the specific code sentence. Firstly, we introduce length offset vector to force the generation of summaries which could contain specific amount of information, laying the groundwork for generating specific summaries. Further, forcing the model to generate summaries with a certain length would bring in invalid or generic descriptions, a context-aware code sentence extractor is proposed to extract specific features corresponding to specific information. Besides, we present a innovative sentence-level code tree to capture the structural semantics and learn the representation of code sentence by graph attention network, which is crucial for specific features extraction. The experiments on CodeXGLUE datasets with six programming language demonstrate that LenANet significantly outperforms the baselines and has the potential to generate specific summaries. In particular, the overall BLEU-4 is improved by 0.53 on the basis of CodeT5 with length control.
Loading