From Intrinsic Toxicity to Reception-Based Toxicity: A Contextual Framework for Prediction and Evaluation

From Intrinsic Toxicity to Reception-Based Toxicity: A Contextual Framework for Prediction and Evaluation

ACL ARR 2026 January Submission6806 Authors

06 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: toxicity detection, abusive language, dialect bias, construct validity, domain adaptation, annotation guidelines, large language models

Abstract: Most toxicity detection models treat toxicity as an intrinsic property of text, overlooking the role of context in shaping its impact. In this position paper, drawing on insights from psychology, neuroscience, and computational social science, we reconceptualise toxicity as a socially emergent signal of stress. We formalise this perspective in the Contextual Stress Framework (CSF), which defines toxicity as a stress-inducing norm violation within a given context and introduces this notion as an additional dimension for toxicity detection. As one possible realisation of CSF, we introduce PONOS (Proportion Of Negative Observed Sentiments), a metric that quantifies toxicity through collective social reception rather than lexical features. We validate this approach on a novel dataset, demonstrating improved contextual sensitivity and adaptability when used alongside existing models.

Paper Type: Long

Research Area: Ethics, Bias, and Fairness

Research Area Keywords: hate-speech detection, model bias/fairness evaluation, language/cultural bias analysis, evaluation methodologies, NLP datasets, corpus creation, benchmarking, sociolinguistics

Contribution Types: Position papers, Theory

Languages Studied: English

Submission Number: 6806

Loading