Big Reasoning with Small Models: Instruction Retrieval at Inference Time

ACL ARR 2026 January Submission6973 Authors

06 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: small language models, inference-time intervention, instruction retrieval, retrieval-augmented reasoning, model efficiency, structured reasoning
Abstract: Small language models (SLMs) enable low-cost, private, on-device inference, but they often fail on problems that require specialized domain knowledge or multi-step reasoning. Existing approaches for improving reasoning either rely on scale (e.g., chain-of-thought prompting), require task-specific training that limits reuse and generality (e.g., distillation), or retrieve unstructured information that still leaves the SLM to determine an appropriate reasoning strategy. We propose instruction retrieval, an inference-time intervention that augments an SLM with structured, reusable reasoning procedures rather than raw passages. We construct an Instruction Corpus by clustering similar training questions and using a teacher model to generate generalizable guides that pair domain background with explicit step-by-step procedures. At inference, the SLM retrieves the instructions most relevant to a given query and executes the associated procedures without any additional fine-tuning. Across three challenging domains—medicine, law, and mathematics, instruction retrieval yields consistent gains for models with at least 3B parameters, improving accuracy by 9.4\%, 7.9\%, and 5.1\%, respectively, with the strongest 14B model surpassing GPT-4o’s zero-shot performance on knowledge-intensive tasks.
Paper Type: Long
Research Area: LLM Efficiency
Research Area Keywords: LLM Efficiency, NLP in resource-constrained settings, efficient models, inference methods, prompting, knowledge-augmented methods, retrieval, security and privacy
Contribution Types: Approaches low compute settings-efficiency
Languages Studied: English
Submission Number: 6973
Loading