Keywords: Emotion Detection; Commit Messages; Software Engineering NLP; Domain Adaptation; Large Language Models; Data Augmentation
TL;DR: This study introduces a 2,000-message GitHub commit dataset; CommiTune (LLaMA augmentation + CodeBERT) boosts technical emotion detection from Macro-F1 0.13–0.21 to ≈0.82.
Abstract: Detecting developer emotion in the informative data stream of technical commit messages is a critical task for gauging signals of burnout or bug introduction, yet it exposes a significant failure point for large language models whose emotion taxonomies are ill-suited for technical contexts in the field of software engineering. To address this, the study introduces a dataset of 2,000 GitHub commit messages that have been human-labeled with a four-label scheme tailored for this domain: Satisfaction, Frustration, Caution, and Neutral. A diagnostic zero-shot evaluation of five pretrained models yields near-chance Macro-F1 (0.13–0.21) and systematic biases. While fine-tuning a code-aware encoder (CodeBERT) establishes a strong baseline (Macro-F1≈0.59), this study introduces CommiTune, a simple hybrid method that first fine-tunes a LLaMA model on the manually labeled dataset, uses it to augment the data, and then fine-tunes CodeBERT on this expanded set, achieving Macro-F1≈0.82 (Accuracy≈0.81) on an untouched test split. This demonstrates that hybrid augmentation can effectively repair the representation gap in technical emotion detection. These results establish reproducible training and validation schemes for software engineering NLP. The code, prompts, and label mappings will be released upon acceptance.
Supplementary Material: zip
Primary Area: foundation or frontier models, including LLMs
Submission Number: 25649
Loading