When can you TRUST Large Language Models?

Radu Paradovschi; Darvin Yi; Andrew Rabinovich; Zhao Chen

When can you TRUST Large Language Models?

Radu Paradovschi, Darvin Yi, Andrew Rabinovich, Zhao Chen

Published: 02 Mar 2026, Last Modified: 06 Mar 2026ICLR 2026 Workshop ICBINBEveryoneRevisionsCC BY 4.0

Keywords: deep learning, large language models

Abstract: Quantifying neural network model uncertainty is a difficult problem that has far-reaching implications on our ability to improve model reliability. Unfortunately, uncertainty quantification is especially difficult in the context of LLMs, as standard methods for uncertainty measurement either rely on single-token outputs or have other structural assumptions that limit utility in practical settings. We will show that these methods do not capture uncertainty well in challenging environments such as difficulty quantification and hallucination detection, and introduce TRUST (Temperature-Related Unambiguity via Similarity Tracking) scores as a way to easily generalize uncertainty methods to all text-in, text-out settings. TRUST scores calculate uncertainty based on semantic similarity of multiple output rollouts for an LLM model, can be calculated without any white-box access to model internals, and strongly outperforms standard methods in quantifying LLM uncertainty.

Email Sharing: We authorize the sharing of all author emails with Program Chairs.

Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.

Submission Number: 36

Loading