Abstract: Neural Networks (NNs) are powerful decision-making tools, but their lack of explainability limits their use in
high-stakes domains such as healthcare and criminal justice. The recent SpArX framework sparsifies NNs and
maps them to (weighted) Quantitative Bipolar Argumentation Frameworks (QBAFs) to provide an argumentative
understanding of their mechanics. QBAFs can be explained by various quantitative argumentative explanation
methods such as Argument Attribution Explanations (AAEs), Relation Attribution Explanations (RAEs), and
Contestability Explanations (CEs) - which assign numerical scores to arguments or relations to quantify their
influence on the dialectical strength of an argument to be explained. However, it remains unexplored how
sparsification of NNs impacts the explanations derived from the corresponding (weighted) QBAFs. In this paper
we explore two directions for impact. First, we empirically investigate how varying the sparsification levels of
NNs affects the preservation of these explanations: using four datasets (Iris, Diabetes, Cancer, and COMPAS), we
find that AAEs are generally well preserved, whereas RAEs are not. Then, for CEs, we find that sparsification can
improve computational efficiency in several cases. Overall, this study offers a preliminary investigation into the
potential synergy between sparsification and explanation methods, opening up new avenues for future research.
Loading