Abstract: The pursuit of efficiency in code review has intensified, prompting a wave of research focused on automating code review comment generation. However, the existing body of research is fragmented, characterized by disparate approaches to task formats, factor selection, and dataset processing. Such variability often leads to an emphasis on refining model structures, overshadowing the critical roles of factor selection and representation. To bridge these gaps, we have assembled a comprehensive dataset that includes not only the primary factors identified in previous studies but also additional pertinent data. Utilizing this dataset, we assessed the impact of various factors and their representations on two leading computational approaches: fine-tuning pre-trained models and using prompts in large language models. Our investigation also examines the potential benefits and drawbacks of incorporating abstract syntax trees to represent code change structures. Our results reveal that: (1) the impact of factors varies between computational paradigms and their representations can have complex interactions; (2) integrating a code structure graph can enhance the graphing of code content, yet potentially impair the understanding capabilities of language models; and (3) strategically combining factors can elevate basic models to outperform those specifically pre-trained for tasks. These insights are pivotal for steering future research in code review automation.
Loading