[MASK]ED - Language Modeling for Explainable Classification and Disentangling of Socially Unacceptable Discourse.

[MASK]ED - Language Modeling for Explainable Classification and Disentangling of Socially Unacceptable Discourse.

ACL ARR 2025 May Submission1817 Authors

18 May 2025 (modified: 03 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Analyzing Socially Unacceptable Discourse (SUD) online is a critical challenge for regulators and platforms amidst growing concerns over harmful content. While Pre-trained Masked Language Models (PMLMs) have proven effective for many NLP tasks, their performance often degrades in multi-label SUD classification due to overlapping linguistic cues across categories. In this work, we propose an artifact-guided pre-training strategy that injects statistically salient linguistic features, referred to as artifacts, into the masked language modelling objective. By leveraging context-sensitive tokens, we guide an importance-weighted masking scheme during pre-training to enhance generalization across discourse types. We further use these artifact signals to inform a lightweight dataset curation procedure that highlights noisy or ambiguous instances. This supports targeted relabeling and filtering, enabling more explainable and consistent annotation with minimal changes to the original data. Our approach provides consistent improvements in 10 datasets extensively used in SUD classification benchmarks. $$\textit{\small{Disclaimer: This article contains some extracts of unacceptable and upsetting language.}}$$

Paper Type: Long

Research Area: Computational Social Science and Cultural Analytics

Research Area Keywords: hate speech detection, pre-training, bias/toxicity, human-AI interaction/cooperation, human-in-the-loop, data shortcuts/artifacts, topic modeling

Contribution Types: Model analysis & interpretability

Languages Studied: English

Keywords: hate speech detection, pre-training, bias/toxicity, human-AI interaction/cooperation, human-in-the-loop, data shortcuts/artifacts, topic modeling

Submission Number: 1817

Loading