Leveraging In-and-Cross Project Pseudo-Summaries for Project-Specific Code Summarization

Yupeng Wu, Tianxiang Hu, Ninglin Liao, Rui Xie, Minghui Zhang, Dongdong Du, Shujun Lin

Published: 2024, Last Modified: 13 Nov 2024IJCNN 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Code summarization is pivotal in software development, aiding developers in grasping the semantics of source code. However, existing research predominantly focuses on the general code summarization capabilities of models, neglecting project-specific summary characteristics. However, given the scarcity of project-internal code summary corpora, enhancing the model’s performance for a specific project presents a significant challenge. To tackle this issue, we introduces the use of In-and-Cross project pseudo-summaries to improve Project-Specific Code Summarization. Specifically, we employ models trained on other projects to generate cross-project pseudo-summaries and learn the distinctions from target-project through contrastive learning. Simultaneously, we utilize in-project pseudo-summaries generated by the current model, harnessing these data through semi-supervised learning to enhance performance. The experiment results show that the proposed method can effectively improve the performance of the summarization task in practical scenarios, and can also enhance the coordination of the model.