FESTA: Functionally Equivalent Sampling for Trust Assessment of Multimodal LLMs

FESTA: Functionally Equivalent Sampling for Trust Assessment of Multimodal LLMs

ACL ARR 2025 May Submission488 Authors

13 May 2025 (modified: 03 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: The accurate trust assessment of large language models (LLMs), which can enable selective prediction and improve user confidence, is challenging due to the diverse multi-modal input paradigms. We propose $\textbf{F}$unctionally $\textbf{E}$quivalent $\textbf{S}$ampling for $\textbf{T}$rust $\textbf{A}$ssessment (FESTA), an input sampling technique for multimodal models, which generates an uncertainty measure based on the equivalent and complementary input sampling. The sampling approach expands the input space to measure the consistency (through equivalent samples) and sensitivity (through complementary samples) properties of the model. These two uncertainty measures are combined to form the final FESTA estimate. Our approach only requires black-box access, and is unsupervised. The experiments are conducted with various off-the-shelf multi-modal LLMs, on visual and audio reasoning tasks. The proposed FESTA approach is shown to significantly improve (33.3\% relative improvement for vision-LLMs and 29.6% relative improvement for audio-LLMs) the area-under-receiver-operating-curve (AUROC) metric on these reasoning tasks.

Paper Type: Long

Research Area: Multimodality and Language Grounding to Vision, Robotics and Beyond

Research Area Keywords: calibration/uncertainty, counterfactual/contrastive explanations, probing, robustness

Contribution Types: Model analysis & interpretability, NLP engineering experiment, Theory

Languages Studied: English

Submission Number: 488

Loading