Emotion Recognition on StackOverflow Posts Using BERT

Donald Bleyl, Elham Khorasani Buxton

Published: 2022, Last Modified: 12 May 2023Big Data 2022Readers: Everyone

Abstract: Social programming websites like GitHub and StackOverflow have become an increasingly important aspect of software development and the publicly available datasets provide a rich source of data for exploring challenging NLP problems. One such problem is emotion recognition. This work applies deep NLP methods for detecting emotions in StackOverflow content. Several BERT models were trained and fine-tuned on a small, sparse, hand-labeled and highly-imbalanced dataset of Stack-Overflow comments. Text augmentation techniques were used to balance the data and the model’s vocabulary was enhanced with common domain-specific terms and emoticons. Unsupervised post-training was applied on a large unlabeled StackOverflow dataset to learn representations for added vocabulary before fine-tuning on labeled data. The final model was benchmarked and compared to prior studies on the same dataset.

0 Replies