MGF-ESE: An Enhanced Semantic Extractor with Multi-Granularity Feature Fusion for Code Summarization
Track: Semantics and knowledge
Keywords: source code summarization, semantic extraction, abstract syntax tree, control flow graph
Abstract: Code summarization aims to generate concise natural language descriptions of source code, helping developers to acquaint with software systems and reduce maintenance costs. Existing code summarization approaches widely employ attention mechanisms to assess the relevance between nodes in the Abstract Syntax Tree (AST), which generates context vectors that reflect the semantics of the source code. However, these approaches with AST fail to extract other granular features, such as token sequences and Control Flow Graphs (CFG), which suffer from severe semantic gaps when capturing data and control dependencies. To address this issue, we design an enhanced semantic extractor with multi-granularity feature fusion (MGF-ESE) to improve the model capability in comprehending and processing the overall semantics of the code. Specifically, to process the AST more effectively, we present a novel AST generation method with compresses the scale of nodes to enhance the semantic information of each node. Then, we present a disentangled attention mechanism based on relative positional embeddings for further encoding. Moreover, we extract the token sequences and CFG of source code to supplement the syntactic and structural information, and further fuse them with the AST separately through cross-attention modules. Finally, extensive experiments on two public datasets show that MGF-ESE outperforms the state-of-the-arts with higher-quality code summaries on key metrics, including BLEU, METEOR, and ROUGE.
Submission Number: 1010
Loading