Transfer Learning and Lexicon-Based Approaches for Implicit Hate Speech Detection: A Comparative Study of Human and GPT-4 Annotation

Published: 01 Jan 2024, Last Modified: 29 Jul 2025ICSC 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Detecting harmful speech is the subject of significant research effort both in the academia and industry. While good progress was made on detecting explicit hate speech, detecting implicit hate remains difficult as it requires a deep understanding of the allusions of the text and the social context in which it was uttered. In this paper we study the effectiveness of several approaches to implicit hate speech detection, including lexicon-based approaches, transfer learning, and the use of up-to-date large language models, such as GPT-4. By combining lexicon-based approach with the targeted topics, we performed transfer learning experiments using knowledge from seven public harmful speech datasets. Various combinations of the proposed approaches showed an improvement of 0.6-2.3% in the F1Macro score compared to the baselines. We observed that while GPT-4 annotations show a good agreement with human labels, there is often a conflict when interpreting sarcasm, text shortening based on context, and speech that targets individuals. Warning: due to the nature of the research subject, this paper contains explicit and potentially offensive language.
Loading