Debiasing Global Workspace: A Cognitive Neural Framework for Learning Debiased and Interpretable Representations
Keywords: Global Workspace Theory, Debiasing Methods, Explainable AI, Cognitive Science
TL;DR: Inspired by global workspace theory, we propose a novel debiasing framework, Debiasing Global Workspace, that learns debiased and interpretable representations of attributes without defining specific bias types.
Abstract: Deep Neural Networks (DNNs) often make predictions based on "spurious" attributes when trained on biased datasets, where most samples have features spuriously correlated with the target labels. This can be problematic if irrelevant features are easier for the model to learn than the truly relevant ones. Existing debiasing methods require predefined bias labels and entail computational complexity with additional networks. We propose an alternative approach inspired by cognitive science, called Debiasing Global Workspace (DGW). DGW consists of specialized modules and a shared workspace, allowing for increased modularity and improved debiasing performance. Additionally, our method enhances the transparency of decision-making processes through attention masks. We validate DGW across various biased datasets, proving its effectiveness in better debiasing performance.
Submission Number: 14
Loading