HateThaiSent: Sentiment-Aided Hate Speech Detection in Thai Language

Krishanu Maity, A. S. Poornash, Shaubhik Bhattacharya, Salisa Phosit, Sawarod Kongsamlit, Sriparna Saha, Kitsuchart Pasupa

Published: 01 Jan 2024, Last Modified: 20 May 2025IEEE Trans. Comput. Soc. Syst. 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Social media platforms are a double-edged sword: on the one hand, they enable the dissemination of information; but on the other hand, they also provide an avenue for spreading online abuse and harassment, such as hate speech. While significant research efforts are being devoted to detecting online hate speech in the English language, little attention has been paid to the Thai language. In this study, we created a benchmark dataset, called HateThaiSent, which labels each post with both hate speech and sentiment information. To detect hate speech, we created a multitask model that uses a dual-channel deep learning approach based on FastText and BERT embeddings, with an added capsule network. One channel utilizes pretrained FastText embeddings while the other uses embeddings from the BERT language model. We aimed to answer two research questions: (Q1) Does incorporating sentiment information improves the performance of hate speech detection (HD) in the Thai language? (Q2) What is the comparative effectiveness of two different approaches for sentiment-aware HD in the Thai language: feature engineering versus multitasking? Our proposed approach outperformed other baselines and state-of-the-art models on the HateThaiSent dataset, with overall accuracy/macro-$F_{1}$ values of 89.67%/89.79%, and 80.92%/80.97% for hate speech and sentiment detection tasks, respectively. We concluded that multitasking is more effective than feature engineering in enhancing the performance of the main task (HD).