On the Performance of Large Language Models for Code Change Intent Classification

Published: 01 Jan 2025, Last Modified: 06 Sept 2025SANER 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Modern Code Review (MCR) is an essential practice in software engineering, supporting early defect detection, enhancing code quality, and fostering knowledge. To manage code review tasks effectively, developers need to understand the intent behind code changes, such as a bug fix, test, refactoring, or new feature. Traditional methods for categorizing code changes in MCR rely on rule-based heuristics with predefined keywords. However, these methods lack context regarding the code changes, leading to limited generalizability, particularly when dealing with sparsely documented changes. This paper addresses these limitations by investigating the potential of Large Language Models (LLMs) for changes' intent classification. We introduce LLM Change Classifier (LLMCC), an LLM-based approach that classifies code changes based on their underlying intent. We evaluate the effectiveness of LLMCC by conducting an empirical study on three open-source projects: Android, OpenS tack, and Qt. The performance of LLMCC was benchmarked against traditional heuristic methods, conventional machine learning algorithms (including Decision Trees and Random Forests), and state-of-the-art transformer models (including BERT and RoBERTa). Results show that LLMCC significantly enhances code change intent classification accuracy, achieving up to a 33 % improvement in F1 score over heuristic-based methods. Additionally, LLMCC outperformed both traditional machine learning and transformer models, achieving an average 77% improvement in terms of Matthew Correlation Coefficient (MCC). These findings underscore the potential of LLMCC to streamline code change intent classification.
Loading