SIFT: Grounding LLM Reasoning in Contexts via Stickers

ACL ARR 2025 May Submission4676 Authors

20 May 2025 (modified: 03 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: This paper identifies that misinterpreting the context can be a significant issue during the reasoning process of large language models, spanning from smaller models like Llama3.2-3B-Instruct to cutting-edge ones like DeepSeek-R1. We introduce a novel, post-training approach called **Stick to the Facts (SIFT)** to tackle this. SIFT leverages increasing inference-time compute to ground LLM reasoning in contexts. At the core of SIFT lies the *Sticker*, which is generated by the model itself to explicitly emphasize the key information within the context. Given the Sticker, SIFT generates two predictions—one from the Sticker alone and one from the query augmented with the Sticker. If they differ, the Sticker is sequentially refined via *forward* optimization (to better align the extracted facts with the query) and *inverse* generation (to conform with the model’s inherent tendencies) for more faithful reasoning outcomes. Studies across diverse models (from 3B to 100B+) and benchmarks (e.g., MATH, AIME) reveal consistent performance improvements. Notably, SIFT improves the pass@1 accuracy of DeepSeek-R1 on AIME2024 from 78.33% to **85.67**% and that on AIME2025 from 69.8% to **77.33**%. Code will be public after acceptance.
Paper Type: Long
Research Area: Language Modeling
Research Area Keywords: reasoning, factuality
Contribution Types: Model analysis & interpretability, Approaches to low-resource settings
Languages Studied: English
Submission Number: 4676
Loading