Abstract: Recently, machine learning has dominated software engineering research, much of which can be attributed to the success of large language models in handling source code-related tasks. Yet, despite the significant advancements in pre-trained language models for such area, the exploration of their potential in a multi-task learning (MTL) environment remains largely unaddressed. For such, this paper offers a comparative analysis between task-specific and multi-task approaches, focusing on two main tasks: Natural Language Code Search and Unit Test Case Generation. We propose a methodology based on prompt MTL and perform an extensive evaluation, which contrasts the performance of MTL models against their single-task counterparts making use of three pre-trained models across seven datasets. Delving deeper, we conduct an additional exploratory analysis to uncover the underlying causes that explain the observed results, investigating the specificities that govern the application of MTL in this context. Our empirical results sketch a nuanced landscape. MTL does not improve the results when compared to its single-task counterparts. Nevertheless, there are some scenarios for particular models, or when data is scarce, that make MTL and STL achieve quite similar results. The most important finding, however, is that the nature of the pre-training tasks significantly affects the fine-tuning capabilities of MTL, opening space for more guided research on how to pre-train and fine-tune those models.
Loading