Keywords: LLMs, automatic code review, fine-tuning, reinforcement learning
TL;DR: This paper introduces the Code Quality Score (CQS) system, to automatically evaluate code quality and generate code reviews
Abstract: Maintaining code quality in large-scale software systems presents significant challenges, particularly in settings where a large number of engineers work concurrently on a codebase. This paper introduces the Code Quality Score (CQS) system to automatically detect issues related to code quality and maintainability and provide actionable insights. At its core, the CQS system utilizes two Llama 3 models, each fine-tuned using supervised fine-tuning (SFT) and/or offline reinforcement learning (RL). One model detects common code quality issues related to coding best practices, while the other provides high-quality critiques for LLM-generated code reviews. To maintain good user experience, we also add a set of hand-crafted rules to the system to further filter out incorrect responses/hallucinations. Offline evaluations, based on internal human labeling, show that the CQS system is able to achieve an impressive precision rate for identifying valid code quality issues. This system has already been rolled out to developers at Meta and has consistently achieved 60\% week over week user reported helpfulness rate, demonstrating its effectiveness in a real-world environment. In this paper, we present details of the CQS system along with some learnings on curating developer feedback to create training data for LLM fine-tuning.
Submission Number: 30
Loading