PRECOS: Project-specific Retrieval for Better Code Summarization

Tingwei Zhu, Zhong Li, Tian Zhang, Minxue Pan, Xuandong Li

Published: 2024, Last Modified: 19 May 2025ICSME 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Code summarization aims to facilitate code com-prehension by automatically generating brief and informative summaries for source code. In software development, different projects often exhibit distinct characteristics. However, existing research frequently overlooks such project-specific knowledge, which may result in sub-optimal summarization performance. In this paper, we propose Precos, a retrieval-based method that leverages the historical examples within the project (i.e., internal corpus) for generating better code summaries. First we construct the internal corpus as a datastore, and extend the datastore by retrieving the most relevant examples for the current project from a large-scale external corpus based on the internal corpus. Then during generation, we retrieve the nearest neighbors from the datastore at each decoding step to interpolate the vanilla target-token distribution. For the retrieved neighbors, we introduce a novel locality-aware distance calibration mechanism, which calibrates the retrieval distance based on the locality of the nearest neighbors, thereby providing more accurate predictions. Experimental results demonstrate that Precos achieves a substantial improvement of up to 8.5 BLEU scores compared to the model before project-specific enhancement, and can generate better code summaries than other comparison methods while maintaining satisfactory results in additional storage, time overhead, and prediction speed11Our source code is available at https://github.com/ztw33/Precos.