Improving Hate Speech Detection: The Impact of Semantic Representations and Preprocessing Techniques

Necva Bölücü, Aysegül Özerdem

Published: 2023, Last Modified: 03 Nov 2025SIU 2023EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Social Media is one of the important tools that can be used to measure the pulse of a society. However, when hate speech targeting an individual or group is produced through this tool, this situation be-comes a phenomenon that can lead to social problems. In this context, the detection of hate speech is crucial. In this study, which is proposed for the hate speech detection shared task at SIU 2023 NST, the importance of semantic representations obtained through the Ope-nAI API is investigated in order to detect hate speech effectively. As preprocessing steps, the normalization of the dataset, an emoji dictionary, and SMOTE technic for the problem of imbalanced dataset have been app-lied. To demonstrate the importance of this step for the problem, basic machine learning techniques, SVM and cosine similarity, are being utilized. The experimental results show that the semantic representations offer a successful solution to the problem with machine learning models. In particular, the solution of the pre-processing step applied for the imbalanced dataset has a great contribution to the problem.

External IDs:dblp:conf/siu/BolucuO23