Mask-Aware Transformers Enable Robust Learning from Incomplete Volumetric Medical Imaging

Camillo Maria Caruso, Riccardo Bruni, Valerio Guarrasi

Published: 2025, Last Modified: 24 Feb 2026ICIAP (Workshops 1) 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Incomplete or sparsely sampled volumetric scans are pervasive across medical imaging modalities: patient motion, shortened acquisition protocols, and hardware constraints frequently result in missing slices that undermine downstream analysis. Conventional deep learning pipelines either discard these studies or rely on voxel-wise interpolation, potentially introducing artefactual signals. We present a Mask-Aware Vision Transformer (MAViT), a modality-agnostic architecture that learns directly from incomplete 3D volumes without synthetic reconstruction. MAViT leverages a binary slice-availability mask to identify corrupted patches and selectively suppress their contribution within each self-attention block, effectively guiding feature aggregation while mitigating the impact of missing data. To benchmark robustness, we synthetically corrupt brain MRI volumes from the Alzheimer’s Disease Neuroimaging Initiative with different slice-drop rates. Despite being trained on heterogeneous missing-slice patterns, MAViT achieves state-of-the-art performance on Alzheimer’s disease classification, surpassing 3D interpolation-based and 2D slice-wise baselines. These findings indicate that mask-aware modelling offers a valuable approach to learning from incomplete volumetric data, readily extending beyond brain MRI to other imaging modalities.

External IDs:dblp:conf/iciap/CarusoBG25