Think First, Then Select and Verify with Query–Key Alignment

ICLR 2026 Conference Submission20582 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Query–Key alignment, attention heads, white-box selection, white-box verification, chain-of-thought (CoT), self-consistency, permutation robustness
TL;DR: A brief CoT “think-first” phase sharpens QK alignment so we can select and verify answers directly from model activations, outperforming decoded-token choices on MMLU-Pro, GSM8K, MATH-500, and HLE
Abstract: We demonstrate that a “think-first” phase via chain-of-thought (CoT) prompting systematically strengthens internal query–key (QK) alignment improving ability to select and verify answers directly from model activations, rather than from decoded tokens. Building on robust multiple-choice evaluation with MMLU-Pro (10 options) and extending to free-form reasoning on MATH-500, GSM8K, and our variant of Humanity’s Last Exam (HLE), we evaluate three settings: (i) MCQA vs MCQA+CoT with QK-based selection; (ii) GSM8K candidate generation with/without CoT followed by QK-based selection among self-proposed answers; and (iii) QK-based verification of LLM solutions and conjectures. We analyze QK-score accuracy, permutation robustness, and diagnostics relating alignment strength to correctness. This design situates QK score selection and verification alongside CoT and self-consistency baselines on canonical reasoning tasks, yielding a white-box, computation-efficient decision rule that aims to match or exceed decoded choices. We argue that these results offer a simple, reproducible path to more reliable reasoning, turning CoT from a purely generative aid into a deliberation-then-selection mechanism grounded in the model’s own representations.
Primary Area: foundation or frontier models, including LLMs
Submission Number: 20582
Loading