Abstract: Pretrained multimodal models offer strong representational priors and sample efficiency, but remain fragile when deployed in real-world settings. Two key challenges underlie this brittleness: (1) inputs are frequently incomplete due to missing or corrupted modalities, and (2) pretrained models may yield unreliable predictions due to distribution mismatch or insufficient adaptation. A common workaround for the first challenge is to reconstruct missing modalities; however, this alone not only fails to resolve the second challenge, but may exacerbate it--introducing additional uncertainty from reconstruction that compounds the inherent unreliability of the pretrained model. We propose \textbf{SURE} (Scalable Uncertainty and Reconstruction Estimation), a lightweight, plug-and-play module that enhances pretrained multimodal pipelines with deterministic latent-space reconstruction and principled uncertainty estimation. SURE decomposes prediction uncertainty into two sources: \textit{input-induced uncertainty}, traced from reconstruction via error propagation, and \textit{model mismatch uncertainty}, reflecting the limits of the frozen model. To support stable uncertainty learning, SURE employs a distribution-free Pearson correlation-based loss that aligns uncertainty scores with reconstruction and task errors. Evaluated on both a tractable linear-Gaussian toy problem and several real-world tasks, SURE improves prediction accuracy and uncertainty calibration, enabling robust, trust-aware inference under missing or unreliable input conditions.
Submission Length: Long submission (more than 12 pages of main content)
Assigned Action Editor: ~Andreas_Kirsch1
Submission Number: 5115
Loading