Keywords: Retrieval-Augmented Generation (RAG), Evidence Grounding, Calibration, Confidence-aware Decoding, Long-context Question Answering, Hallucination Mitigation, Long-horizon Consistency
Abstract: Long-context language models frequently fail in two high-impact regimes: (i) high-confidence hallucination under insufficient evidence, and (ii) long-horizon inconsistency across multi-turn dialogue and long-form generation. We propose STAR-Memory, a retrieval-augmented framework that makes reliability a first-class objective across memory selection, decoding, and training. STAR-Memory introduces Tri-Factor Memory Selection that jointly optimizes relevance, constraint adherence, and evidence support to construct an explicit grounding set. We further propose Gentle Guidance Decoding, a confidence-aware decoding rule that suppresses unsupported high-certainty continuations and triggers explicit uncertainty when evidence coverage is low. Finally, we unify evidence-consistency loss, over-confidence regularization, and long-horizon consistency reward into a single objective. Across long-context QA and factual verification benchmarks, STAR-Memory improves accuracy while reducing calibration error and long-horizon contradiction rate.
Paper Type: Short
Research Area: Phonology, Morphology and Word Segmentation
Research Area Keywords: Retrieval-Augmented Generation, Question Answering, Hallucination, Calibration, Uncertainty Estimation, Decoding
Contribution Types: NLP engineering experiment, Data analysis
Languages Studied: English
Submission Number: 2261
Loading