Improving Code Summarization With Tree Transformer Enhanced by Position-Related Syntax Complement

Published: 01 Jan 2024, Last Modified: 19 May 2025IEEE Trans. Artif. Intell. 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Code summarization aims to generate natural language (NL) summaries automatically given the source code snippet, which aids developers in understanding source code faster and improves software maintenance. Recent approaches using NL techniques in code summarization fall short of adequately capturing the syntactic characteristics of programming languages (PLs), particularly the position-related syntax, from which the semantics of the source code can be extracted. In this article, we present Syntax transforMer (SyMer) based on the transformer architecture where we enhance it with position-related syntax complement (PSC) to better capture syntactic characteristics. PSC takes advantage of unambiguous relations among code tokens in abstract syntax tree (AST), as well as the gathered attention on crucial code tokens indicated by its syntactic structure. The experimental results demonstrate that SyMer outperforms state-of-the-art models by at least 2.4% bilingual evaluation understudy (BLEU), 1.0% metric for evaluation of translation with explicit ORdering (METEOR) on Java benchmark, and 4.8% (BLEU), 5.1% (METEOR), and 3.2% recall-oriented understudy for gisting evaluation - longest common subsequence (ROUGE-L) on Python benchmark.
Loading