Keywords: word-level hate speech detection, word-in-context, lexical sense modeling, annotator subjectivity, contrastive learning, reinforcement learning
Abstract: Word-level hate speech detection requires modeling both contextual meaning and annotator perspectives, yet current methods often overlook definitional sense and annotator subjectivity. We propose Aware-Hate, a framework integrating dictionary definitions and annotator profiles. Our two-stage training establishes classification capability through initial supervised learning, then refines predictions via RL-based alignment. Experimental results demonstrate superior performance over fine-tuned LLMs, with ablations verifying that joint modeling of lexical sense and annotator subjectivity enhances detection efficacy.
Paper Type: Long
Research Area: Computational Social Science, Cultural Analytics, and NLP for Social Good
Research Area Keywords: hate-speech detection, language/cultural bias analysis, sociolinguistics, NLP tools for social analysis
Contribution Types: Model analysis & interpretability, NLP engineering experiment
Languages Studied: English
Submission Number: 4155
Loading