Keywords: automated code review, large language models, real-world deployment
TL;DR: We propose Coder-R3, an approach to finetune LLMs for comprehensive code review tasks in practical enterprise scenario.
Abstract: Large language models (LLMs) have emerged as powerful tools for software engineering tasks, demonstrating particular promise in code review activities. However, existing research on code review LLMs typically decomposes the review process into discrete subtasks, collecting data and fine-tuning separate models for each individual component. This fragmented approach overlooks the synergistic relationships between different tasks, necessitates multiple models with complex multi-stage invocations, and consequently exhibits limited practical applicability in real-world deployment scenarios. In this work, we advance beyond previous code review research by proposing a unified and comprehensive code review problem modeling approach. We focus on the complete code review process, including Recognize, Review, and Repair defective code fragments consecutively, and propose $Coder\text{-}R^3$, an approach that enables a single LLM to handle all code review-related subtasks uniformly. Additionally, we establish a practically feasible closed-loop iterative process for industrial scenarios, encompassing data construction, model evaluation, and operational feedback integration. We rigorously evaluate the effectiveness of various strategies, including input context selection, output format, and training methodologies. $Coder\text{-}R^3$ achieves state-of-the-art performance on the CodeReviewer benchmark, and demonstrates superior effectiveness in real enterprise scenarios. Our work provides valuable insights for enterprises seeking to leverage large language models to enhance code review efficiency.
Primary Area: foundation or frontier models, including LLMs
Submission Number: 17793
Loading