Explain, Explain, Explain: Uncertainty–Explanation Alignment for EEG Models Under Artifact and Explainer Perturbations

Published: 10 May 2026, Last Modified: 10 May 2026XTAI-2026 OralEveryoneRevisionsCC BY 4.0
Keywords: Explainable AI, transparency auditing, uncertainty quantification, EEG, brain computer interfaces, perturbation based explanations, model calibration, attribution faithfulness, deep ensembles, selective explanation
TL;DR: We test whether model uncertainty tracks explanation quality under perturbations in EEG models, and show that withholding explanations when uncertainty is high reduces sharp but misleading attributions.
Abstract: Perturbation based attribution methods are widely used to make AI predictions transparent, but they provide no standard check that an explanation was generated under conditions where the model remained reliable. We frame this as a transparency audit problem and argue that an explanation is trustworthy only when predictive calibration is maintained under the same perturbations used by the explainer. We instantiate this idea on EEG based brain computer interfaces, where artefacts such as channel dropout, ocular bursts, muscle noise, line noise, and bandpass mismatch provide controlled and interpretable perturbations for studying explanation failure under distribution shift. Our audit introduces three diagnostics: uncertainty faithfulness alignment, measured by Spearman ρalign between predictive entropy and deletion AUC; a monotonicity score M; and a misalignment rate that captures trials where the model is uncertain but the attribution remains sharply concentrated. We also propose abstain to explain, a control that withholds attribution maps when predictive entropy exceeds a calibrated threshold. Experiments on the BCI Competition IV 2a dataset with EEGNet and ShallowConvNet, combined with single softmax, temperature scaling, MC dropout, and deep ensembles, show that deep ensembles produce the most audit compliant explanations and that abstaining removes many sharp but misleading attributions. The protocol is designed as an accountability mechanism for perturbation based explanation pipelines and generalises beyond EEG to broader AI settings.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 2
Loading