Abstract: Toxic language is difficult to define, as it is not monolithic
and has many variations in perceptions of toxicity. This challenge
of detecting toxic language is increased by the highly
contextual and subjectivity of its interpretation, which can degrade
the reliability of datasets and negatively affect detection
model performance. To fill this void, this paper introduces a
toxicity inspector framework that incorporates a human-in the-loop pipeline with the aim of enhancing the reliability of toxicity benchmark datasets by centering the evaluator’s attention
through an iterative feedback cycle. The centerpiece
of this framework is the iterative feedback process, which
is guided by two metric types (hard and soft) that provide
evaluators and dataset creators with insightful examination to
balance the tradeoff between performance gains and toxicity
avoidance.
Loading