Decoder-Free Supervoxel GNN for Accurate Brain-Tumor Localization in Multi-modal MRI

Andrea Protani, Marc Molina Van den bosch, Lorenzo Giusti, Heloisa Barbosa Da Silva, Paolo Cacace, Albert Sund Aillet, Friedhelm Hummel, Luigi Serio

Published: 19 Jan 2026, Last Modified: 28 Jan 2026MICCAI 2025EveryoneCC BY 4.0

Abstract: Modern vision backbones for 3D medical imaging typically processdensevoxelgridsthroughparameter-heavyencoder-decoderstruc- tures, a design that allocates a significant portion of its parameters to spatial reconstruction rather than feature learning. Our approach intro- duces SVGFormer, a decoder-free pipeline built upon a content-aware groupingstagethatpartitionsthevolumeintoasemanticgraphofsuper- voxels. Its hierarchical encoder learns rich node representations by com- biningapatch-levelTransformerwithasupervoxel-levelGraphAttention Network, jointly modeling fine-grained intra-region features and broader inter-regionaldependencies.Thisdesignconcentratesalllearnablecapac- ity on feature encoding and provides inherent, dual-scale explainability from the patch to the region level. To validate the framework’s flexibility, we trained two specialized models on the BraTS dataset: one for node- level classification and one for tumor proportion regression. Both models achieved strong performance, with the classification model achieving a F1-score of 0.875 and the regression model a MAE of 0.028, confirming the encoder’s ability to learn discriminative and localized features. Our results establish that a graph-based, encoder-only paradigm offers an accurate and inherently interpretable alternative for 3D medical image representation.