Uncertainty-Aware Vision Transformers for Medical Image Analysis

03 Aug 2024 (modified: 01 Sept 2024)MICCAI 2024 Workshop UNSURE SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Vision Transformers, Out-of-distribution detection
Abstract: Vision transformers (ViTs) have emerged as strong alternatives to conventional convolutional neural networks (CNNs), owing to their scalability, enhanced generalization, and superior performance in out-of-distribution (OOD) scenarios. Despite their strengths, ViTs are prone to significant overfitting with scarce training data. This issue severely limits their reliability in critical applications, such as biomedical image analysis, where accurate uncertainty estimation is crucial. The challenge lies in the inherent lack of insight into the transformer network’s confidence and uncertainty levels. To tackle this issue, we propose a novel stochastic vision transformer characterized by three components: 1) Stochastic elliptical Gaussian embedding which encodes uncertainty into the embedding of image patches, 2) a Fréchet Inception Distance (FID)-based attention mechanism for the Gaussian embeddings and 3) a FID-based regularization term, which imposes distance and uncertainty awareness into the learning of stochastic representations. We demonstrate the effectiveness of our method for in-distribution calibration and OOD detection experiments on the skin cancer dataset ISIC2019.
Submission Number: 16
Loading