Abstract: The automatic detection of disinformation presents a significant challenge in the field of natural
language processing. This task addresses a multifaceted societal and communication issue, which
needs approaches that extend beyond the identification of general linguistic patterns through datadriven algorithms. In this research work, we hypothesise that text classification methods are not
able to capture the nuances of disinformation and they often ground their decision in superfluous
features. Hence, we apply a post-hoc explainability method (SHAP, SHapley Additive exPlanations)
to identify spurious elements with high impact on the classification models. Our findings show that
non-informative elements (e.g., URLs and emoticons) should be removed and named entities (e.g.,
Rwanda) should be pseudo-anonymized before training to avoid models’ bias and increase their
generalization capabilities. We evaluate this methodology with internal dataset and external dataset
before and after applying extended data preprocessing and named entity replacement. The results
show that our proposal enhances on average the performance of a disinformation classification method
with external test data in 65.78% without a significant decrease of the internal test performance.
Loading