STAR-Memory: Evidence-Grounded Long-Context Generation with Calibrated Uncertainty

STAR-Memory: Evidence-Grounded Long-Context Generation with Calibrated Uncertainty

ACL ARR 2026 January Submission2261 Authors

02 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Retrieval-Augmented Generation (RAG), Evidence Grounding, Calibration, Confidence-aware Decoding, Long-context Question Answering, Hallucination Mitigation, Long-horizon Consistency

Abstract: Long-context language models frequently fail in two high-impact regimes: (i) high-confidence hallucination under insufficient evidence, and (ii) long-horizon inconsistency across multi-turn dialogue and long-form generation. We propose STAR-Memory, a retrieval-augmented framework that makes reliability a first-class objective across memory selection, decoding, and training. STAR-Memory introduces Tri-Factor Memory Selection that jointly optimizes relevance, constraint adherence, and evidence support to construct an explicit grounding set. We further propose Gentle Guidance Decoding, a confidence-aware decoding rule that suppresses unsupported high-certainty continuations and triggers explicit uncertainty when evidence coverage is low. Finally, we unify evidence-consistency loss, over-confidence regularization, and long-horizon consistency reward into a single objective. Across long-context QA and factual verification benchmarks, STAR-Memory improves accuracy while reducing calibration error and long-horizon contradiction rate.

Paper Type: Short

Research Area: Phonology, Morphology and Word Segmentation

Research Area Keywords: Retrieval-Augmented Generation, Question Answering, Hallucination, Calibration, Uncertainty Estimation, Decoding

Contribution Types: NLP engineering experiment, Data analysis

Languages Studied: English

Submission Number: 2261

Loading