Keywords: Vision GNN, Deep Learning Architectures, Computer Vision, Neural Architecture Search
TL;DR: We propose SearchViG, a novel framework that automatically designs optimal heterogeneous architectures by performing a Graph Construction Search (GCS) to produce the optimal graph topology for each stage in a Vision GNN.
Abstract: Vision Graph Neural Networks (ViGs) are often limited by their reliance on a fixed, homogeneous graph construction rule applied across all network stages. To address this limitation, we introduce SearchViG, a novel framework that automatically designs optimal heterogeneous architectures by performing a Graph Construction Search (GCS) to produce the optimal graph topology for each stage within our designated search space. Our search is guided by a zero-shot, theoretically-grounded proxy: the spectral gap of the graph's adjacency matrix, which quantifies its Ramanujan-like expansion properties, provably linking it to superior information flow. SearchViG discovers new heterogeneous architectures that assign different graph topologies, number of neighbors, and hops between neighbors based on feature resolution. Our resulting models establish a new state-of-the-art Pareto frontier for Vision GNNs. For instance, our SearchViG-M achieves 83.3\% top-1 accuracy, outperforming both Vision GNN-B (ViG-B) and Vision Hypergraph Neural Network-B (ViHGNN-B) while using over 70\% fewer parameters and 80\% fewer GMACs. This efficiency extends to downstream tasks, where our lightweight SearchViG-S obtains 43.4 mIoU, 43.5 $AP^{box}$, and 39.9 $AP^{mask}$, surpassing the much larger Pyramid Vision Transformer-Large (PVT-Large) across all metrics while using 80\% fewer parameters. Code is available at https://github.com/SLDGroup/SearchViG.
Submission Type: Full paper proceedings track submission (max 9 main pages).
Publication Agreement: pdf
Software: https://github.com/SLDGroup/SearchViG
Poster: png
Poster Preview: png
Submission Number: 30
Loading