Code Question Answering via Task-Adaptive Sequence-to-Sequence Pre-training

Tingrui Yu, Xiaodong Gu, Beijun Shen

Published: 2022, Last Modified: 27 Jun 2023APSEC 2022Readers: Everyone

Abstract: The development of a question answering (QA) system for code can greatly facilitate programs understanding for developers. Recently, pre-trained language models (PLMs) have shown promising results in the code QA task. However, directly applying PLMs to code QA often causes suboptimal performance due to the large discrepancy between pre-training and the downstream QA task. While code PLMs are pre-trained on largescale unlabeled code corpora, there is often a scarce availability of annotated QA pairs for fine-tuning. Existing code PLMs simply reuse the code representation part and require to train the QA part from scratch, which causes the model to overfit QA data. In this paper, we propose CodeMaster, a novel pre-training based approach for automatically answering code questions via task adaptation. CodeMaster employs CodeT5, a popular PLM for source code. In order to mitigate the gap between pretraining and QA, CodeMaster continually pre-trains CodeT5 on multiple self-supervised learning tasks such as partial comment completion and noun-phrase prediction. Experimental results on the CodeQA benchmark show that CodeMaster achieves state-of-the-art performance, and highlight the effectiveness of our approach.

0 Replies