Keywords: Core-periphery Structure, Self-Attention
Abstract: Designing more efficient, reliable, and explainable neural network architectures is a crucial topic in the artificial intelligence (AI) field. Numerous efforts have been devoted to exploring the best structures, or structural signatures, of well-performing artificial neural networks (ANN). Previous studies, by post-hoc analysis, have found that the best-performing ANNs surprisingly resemble biological neural networks (BNN), which indicates that ANNs and BNNs may share some common principles to achieve optimal performance in either machine learning tasks or cognitive/behavior processes. Inspired by this phenomenon, rather than relying on post-hoc schemes, we proactively instill organizational principles of BNNs to guide the redesign of ANNs by infusing an efficient information communication mechanism of BNNs into ANNs. Specifically, we quantified the typical Core-Periphery (CP) organization of the human brain networks, infused the CorePeriphery principle into the redesign of the vision transformer (ViT), and proposed a novel CP-ViT architecture: the pair-wised densely interconnected self-attention architecture of ViT was upgraded by a sparse Core-Periphery architecture. In CPViT, the attention operation between nodes (image patches) is defined by a sparse graph with a Core-Periphery structure (CP graph), where the core nodes are redesigned and reorganized to play an integrative role and serve as a center for other periphery nodes to exchange information. We evaluated the proposed CP-ViT on multiple public datasets, including medical image datasets (INbreast) and natural image datasets (CIFAR-100). We show that there exist sweet spots of CP graphs that lead to CP-ViTs with significantly improved performance. In general, our work advances the state of the art in three aspects: 1) This work provides novel insights for brain-inspired AI: we can instill the efficient information communication mechanism of BNNs into ANNs by infusing similar organizational principles of BNNs into ANNs; 2) The optimized CP-ViT can significantly improve its predictive performance while dramatically reducing computational cost by benefiting from the infused efficient information communication mechanism existing in BNNs; and 3) The core nodes in CP-ViT can identify task-related meaningful and important image patches, which can significantly enhance the interpretability of the trained deep model. (Code is ready for release).
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
Submission Guidelines: Yes
Please Choose The Closest Area That Your Submission Falls Into: Deep Learning and representational learning
5 Replies
Loading