Code Summarization with Project-Specific Features

Published: 01 Jan 2024, Last Modified: 20 May 2025ECML/PKDD (9) 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Code summarization aims to automatically generate natural language descriptions for code snippets, which help people maintain and understand code snippets. Existing code summarization methods are mostly based on the encoder-decoder structure, where the encoder learns latent features from a code snippet and the decoder generates the corresponding summary based on the features. Such methods do not leverage project-specific information and tend to generate general summaries. However, in practice developers want the generated summaries to be project-specific, i.e., being consistent with the existing summaries in the same project on aspects such as sentence patterns and domain concepts. In this work, we investigate project-specific code summarization. We propose a two-stage method CSWPS, which can be seamlessly integrated into any existing encoder-decoder summarization model. In the first stage, CSWPS learns project-specific features from existing summaries in each project using multi-task learning. In the second stage, CSWPS samples from the project-specific features conditioned on the input source code and project information, and extracts the features most relevant to the input code. The features guide the decoder to generate a project-specific summary for the input code. By incorporating CSWPS into existing code summarization models, we can always improve their performance and achieve the new state-of-the-art. We also empirically show that the summaries generated by incorporating CSWPS are more project-specific, via feature visualization and human study. A replication package for this work is available at https://github.com/DaSESmartEdu/CSWPS.
Loading