A Robust Ensemble Machine Learning Model with Advanced Voting Techniques for Comment Classification

Ariful Islam Shiplu; Md. Mostafizer Rahman; Yutaka Watanobe

A Robust Ensemble Machine Learning Model with Advanced Voting Techniques for Comment Classification

Ariful Islam Shiplu, Md. Mostafizer Rahman, Yutaka Watanobe

Published: 01 Jan 2023, Last Modified: 19 Feb 2025BDA (Astronomy, Science, and Engineering) 2023EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: In the modern era, we find ourselves immersed in an ever-expanding flow of data where data is increasing exponentially. Data is generated from different platforms like Education, Business, E-commerce, and predominantly, social media platforms such as Twitter, YouTube, Facebook, and Instagram. Amidst this proliferation of content, user comments have emerged as a crucial element, serving as a platform for expressions of opinions, commendations, and critiques. However, within the abundance of user feedback lies a persistent issue: the presence of undesirable comments that elicit negative emotional responses and prove to be tedious and irrelevant. Effectively identifying and removing such comments poses a major challenge. This research addresses the imperative need for a robust comment classification model. To tackle this issue, a comprehensive investigation is conducted, employing a variety of machine learning models, including Decision Trees, Random Forests (RF), Naive Bayes, K-Nearest Neighbors, Gradient Boosting, AdaBoost, Logistic Regression, and Support Vector Machines (SVM) for comment classification. Furthermore, fundamental voting techniques such as Hard-Voting, Averaging, and Soft-Voting are incorporated with machine learning models to improve the classification performance. The objective is to discern the characteristics of text comments, classifying them, with the aim of achieving superior accuracy compared to prior research. In this paper, we propose a robust ensemble model, RF+AdaBoost+SVM+Soft-Voting, specifically designed for comment classification. The results obtained indicate that the proposed ensemble model achieved an impressive accuracy of approximately 98% for comment classification on YouTube dataset.

Loading