Can Interpretability Layouts Influence Human Perception of Offensive Sentences?

Published: 01 Jan 2024, Last Modified: 13 Oct 2024EXTRAAMAS 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: This paper conducts a user study to assess whether three machine learning (ML) interpretability layouts can influence participants’ views when evaluating sentences containing hate speech, focusing on the “Misogyny” and “Racism” classes. Given the existence of divergent conclusions in the literature, we provide statistical and qualitative analyses of questionnaire responses using the Generalized Additive Model to estimate participants’ ratings, incorporating within-subject and between-subject designs. While our statistical analysis indicates that none of the interpretability layouts significantly influences participants’ views, our qualitative analysis demonstrates the advantages of ML interpretability: 1) triggering participants to provide corrective feedback in case of discrepancies between their views and the model, and 2) providing insights to evaluate a model’s behavior beyond traditional performance metrics.
Loading