Toxicity Inspector : A Framework to Evaluate Ground Truth in Toxicity Detection Through Feedback

Abeer Aldayel

Published: 01 Jul 2023, Last Modified: 06 Jan 2024Novel Evaluation Approaches for Text Classification Systems (NEATCLasS), ICWSM 2023EveryoneCC BY-SA 4.0

Abstract: Toxic language is difficult to define, as it is not monolithic and has many variations in perceptions of toxicity. This challenge of detecting toxic language is increased by the highly contextual and subjectivity of its interpretation, which can degrade the reliability of datasets and negatively affect detection model performance. To fill this void, this paper introduces a toxicity inspector framework that incorporates a human-in the-loop pipeline with the aim of enhancing the reliability of toxicity benchmark datasets by centering the evaluator’s attention through an iterative feedback cycle. The centerpiece of this framework is the iterative feedback process, which is guided by two metric types (hard and soft) that provide evaluators and dataset creators with insightful examination to balance the tradeoff between performance gains and toxicity avoidance.