Abstract: Large Language Models (LLMs) are known to
produce very high-quality tests and responses to
our queries. But how much can we trust this generated text? In this paper, we study the problem of
uncertainty quantification in LLMs. We propose
a novel Random-Set Large Language Model (RSLLM) approach which predicts finite random sets
(belief functions) over the token space, rather than
probability vectors as in classical LLMs. In order
to allow so efficiently, we also present a methodology based on hierarchical clustering to extract
and use a budget of “focal” subsets of tokens upon
which the belief prediction is defined, rather than
using all possible collections of tokens, making
the method scalable yet effective. RS-LLMs encode the epistemic uncertainty induced in their
generation process by the size and diversity of its
training set via the size of the credal sets associated with the predicted belief functions. The proposed approach is evaluated on CoQA and OBQA
datasets using Llama2-7b, Mistral-7b and Phi-2
models and is shown to outperform the standard
model in both datasets in terms of correctness of
answer while also showing potential in estimating the second level uncertainty in its predictions
and providing the capability to detect when its
hallucinating.
Loading