Embedding Trust: Semantic Isotropy Predicts Nonfactuality in Long-Form Text Generation

Embedding Trust: Semantic Isotropy Predicts Nonfactuality in Long-Form Text Generation

ICLR 2026 Conference Submission20134 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Language Modeling, Trustworthiness, Semantic Uncertainty, Long-Form Natural Language Generation

TL;DR: We introduce Semantic Isotropy, a geometry-inspired metric for assessing the trustworthiness of long-form language model outputs, and demonstrate its effectiveness and robustness across diverse models and evaluation settings, achieving new SOTA.

Abstract: To deploy large language models (LLMs) in high-stakes application domains that require substantively accurate responses to open-ended prompts, we need reliable, computationally inexpensive methods that assess the trustworthiness of long-form responses generated by LLMs. However, existing approaches often rely on claim-by-claim fact-checking, which is computationally expensive and brittle in long-form responses to open-ended prompts. In this work, we introduce semantic isotropy—the degree of uniformity across normalized text embeddings on the unit sphere—and use it to assess the trustworthiness of long-form responses generated by LLMs. To do so, we generate several long-form responses, embed them, and estimate the level of semantic isotropy of these responses as the angular dispersion of the embeddings on the unit sphere. We find that higher semantic isotropy—that is, greater embedding dispersion—reliably signals lower factual consistency across samples. Our approach requires no labeled data, no fine-tuning, and no hyperparameter selection, and can be used with open- or closed-weight embedding models. Across multiple domains, our method consistently outperforms existing approaches in predicting nonfactuality in long-form responses using only a handful of samples—offering a practical, low-cost approach for integrating trust assessment into real-world LLM workflows.

Primary Area: foundation or frontier models, including LLMs

Submission Number: 20134

Loading