Leveraging Comment Retrieval for Code Summarization

Shifu Hou, Lingwei Chen, Mingxuan Ju, Yanfang Ye

Published: 2023, Last Modified: 15 Oct 2025ECIR (2) 2023EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Open-source code often suffers from mismatched or missing comments, leading to difficult code comprehension, and burdening software development and maintenance. In this paper, we design a novel code summarization model CodeFiD to address this laborious challenge. Inspired by retrieval-augmented methods for open-domain question answering, CodeFiD first retrieves a set of relevant comments from code collections for a given code, and then aggregates presentations of code and these comments to produce a natural language sentence that summarizes the code behaviors. Different from current code summarization works that focus on improving code representations, our model resorts to external knowledge to enhance code summarizing performance. Extensive experiments on public code collections demonstrate the effectiveness of CodeFiD by outperforming state-of-the-art counterparts across all programming languages.

External IDs:dblp:conf/ecir/HouCJY23