A Note on the Code Quality Score System: LLMs for Maintainable Large Codebases

Jalaj Bhandari; Sherman Wong; Fan Yang

A Note on the Code Quality Score System: LLMs for Maintainable Large Codebases

Jalaj Bhandari, Sherman Wong, Fan Yang

Published: 22 Sept 2025, Last Modified: 25 Nov 2025DL4C @ NeurIPS 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: LLMs, automatic code review, fine-tuning, reinforcement learning

TL;DR: This paper introduces the Code Quality Score (CQS) system, to automatically evaluate code quality and generate code reviews

Abstract: Maintaining code quality in large-scale software systems presents significant challenges, particularly in settings where a large number of engineers work concurrently on a codebase. This paper introduces the Code Quality Score (CQS) system to automatically detect issues related to code quality and maintainability and provide actionable insights. At its core, the CQS system utilizes two Llama 3 models, each fine-tuned using supervised fine-tuning (SFT) and/or offline reinforcement learning (RL). One model detects common code quality issues related to coding best practices, while the other provides high-quality critiques for LLM-generated code reviews. To maintain good user experience, we also add a set of hand-crafted rules to the system to further filter out incorrect responses/hallucinations. Offline evaluations, based on internal human labeling, show that the CQS system is able to achieve an impressive precision rate for identifying valid code quality issues. This system has already been rolled out to developers at Meta and has consistently achieved 60\% week over week user reported helpfulness rate, demonstrating its effectiveness in a real-world environment. In this paper, we present details of the CQS system along with some learnings on curating developer feedback to create training data for LLM fine-tuning.

Submission Number: 30

Loading