Keywords: Misclassification Detection, Multimodal Learning
TL;DR: We introduce MultiMisD, the first framework specifically designed for multimodal misclassification detection.
Abstract: The deployment of multimodal models in safety-critical applications, such as autonomous driving and medical diagnostics, requires more than high predictive accuracy; it also demands reliable mechanisms for detecting failures. In this work, we address the largely unexplored problem of misclassification detection in multimodal settings. We present MultiMisD, a novel framework specifically designed to identify such multimodal failures. Our approach is driven by a key observation: in most misclassification cases, the confidence of the multimodal prediction is significantly lower than that of at least one unimodal branch, a phenomenon we term confidence degradation. To mitigate this, we introduce an Adaptive Confidence Loss that penalizes such degradations during training. In addition, we propose Multimodal Feature Swapping, a novel outlier synthesis technique that generates challenging, failure-aware training examples. By training with these synthetic failures, MultiMisD learns to more effectively recognize and reject uncertain predictions, thereby improving overall reliability. Extensive experiments across four datasets, three modalities, and multiple evaluation settings demonstrate that MultiMisD achieves consistent and robust gains. The source code will be publicly released.
Primary Area: applications to computer vision, audio, language, and other modalities
Submission Number: 5184
Loading