Keywords: Hallucination Detection, LLMS, Time Series Classification
TL;DR: We use logprobs matrix [Seq_len, Top-k] of llm generation as time series data and pass it to GRU to detect whether the response has hallucinations or not.
Abstract: Hallucinations remain a major obstacle for large language models (LLMs), especially in safety-critical domains. We present HALT (Hallucination Assessment via Log-probs as Time series), a lightweight hallucination detector that leverages only the top-20 token log-probabilities from LLM generations as a time series. HALT uses a gated recurrent unit model combined with entropy-based features to learn model-specific calibration bias, providing an extremely efficient alternative to large encoders. Unlike white-box approaches, HALT does not require access to hidden states or attention maps, relying only on output log-probabilities. Unlike black-box approaches, it operates on log-probs rather than surface-form text, which enables stronger domain generalization and compatibility with proprietary LLMs without requiring access to internal weights. To benchmark performance, we introduce HUB (Hallucination detection Unified Benchmark), which consolidates prior datasets into ten capabilities covering both reasoning tasks (Algorithmic, Commonsense, Mathematical, Symbolic, Code Generation) and general-purpose skills (Chat, Data-to-Text, Question Answering, Summarization, World Knowledge). While being 30× smaller, HALT outperforms Lettuce, a fine-tuned modernBERT-base encoder, achieving a 60× speedup gain on HUB. Together, HALT and HUB establish an effective framework for hallucination detection across diverse LLM capabilities.
Primary Area: foundation or frontier models, including LLMs
Submission Number: 21647
Loading