Exploring Geometric Concentration for Quantifying Uncertainty in Scientific Image Caption Generation
Keywords: Uncertainty Quantification, Large Language Models, Sampling Based Methods
Abstract: Uncertainty Quantification (UQ) methods for Large Language Models (LLMs) have primarily been evaluated on question-answering benchmarks, where outputs are short, structured, and comparisons between generations are relatively well-defined. In contrast, many practical generative tasks involve open-ended, complex outputs, motivating evaluation of current state-of-the-art UQ beyond simple question-answering settings. In this work, we explore the challenging task of UQ for scientific image captioning. Using a subset of the ArxivCap dataset and two popular multimodal LLMs, we compare \emph{Directional Concentration Uncertainty} (DCU), a geometric UQ measure proposed by \citet{dcu_2026}, against semantic entropy \citep{kuhnetal23}, a leading method for UQ on structured question-answering. Our results indicate that DCU clearly outperforms SE, motivating further research into applications of DCU to other, complex tasks.
Submission Number: 20
Loading