Keywords: Neural architecture search, Multimodal learning, Sparse modeling, Shapley value, Explainability
Abstract: Despite their excellent performance on various multimodal learning tasks, deep neural networks (DNNs) are often characterized as "black boxes". Some techniques aid in designing explainable DNNs. For instance, sparse modeling limits the sparsity while preserving key features, and the Shapley value from game theory quantifies the true contribution of each component, both of which are recognized for their strong explainability. However, designing explainable multimodal DNNs by manually designing unimodal backbones and multimodal feature fusion models requires substantial expertise and time. This paper proposes a novel multimodal neural architecture search (NAS) method, termed Shapley-Enhanced Multimodal Neural Architecture Search via Sparse Modeling (SM-ShapNAS), for automating the design of appropriate and explainable multimodal DNNs. SM-ShapNAS incorporates sparse attention and sparse convolutional operations within a predefined search space, and uses the Shapley value approximated by group policy to evaluate the true contribution of each operation in the fusion cells. By combining sparse modeling and the Shapley value, the proposed SM-ShapNAS automatically generates efficient and explainable multimodal DNNs. Experimental results on three multimodal datasets demonstrate that the SM-ShapNAS achieves competitive performance compared to the state-of-the-art multimodal NAS methods, particularly in noisy environments.
Supplementary Material: zip
Primary Area: applications to computer vision, audio, language, and other modalities
Submission Number: 6481
Loading