ToxiSight: Insights Towards Detected Chat Toxicity

Published: 21 Sept 2024, Last Modified: 06 Oct 2024BlackboxNLP 2024EveryoneRevisionsBibTeXCC BY 4.0
Track: Extended abstract
Keywords: Explainable AI, Toxicity, Hate-Speech, HCI
TL;DR: ToxiSight is a comprehensive dashboard for explaining and understanding toxicity detection models.
Abstract: We present a comprehensive explainability dashboard designed for in-game chat toxicity. This dashboard integrates various existing explainable AI (XAI) techniques, including token importance analysis, model output visualization, and attribution to the training dataset. It also provides insights through the closest positive and negative examples, facilitating a deeper understanding and potential correction of the training data. Additionally, the dashboard includes word sense analysis—particularly useful for new moderators—and offers free-text explanations for both positive and negative predictions. This multi-faceted approach enhances the interpretability and transparency of toxicity detection models.
Copyright PDF: pdf
Submission Number: 54
Loading