Abstract: Small language models (SLMs) typically falter on tasks requiring deep, multi-step reasoning. This paper introduces SMART ( Small Reasons, Large Hints), a framework where large language models (LLMs) provide targeted, selective guidance to augment SLM reasoning. Drawing from cognitive scaffolding, SMART uses a score-based mechanism to identify uncertain SLM reasoning steps, triggering LLM correction only when essential. This approach, framing structured reasoning as an optimal policy search, steers SLMs towards correct solutions without exhaustive sampling. On mathematical reasoning datasets, SMART enables SLMs to achieve up to 98.9% of LLM-level performance while reducing LLM token usage by up to 90.0%. Our work paves the way for collaborative use of both SLM and LLM to tackle complex reasoning tasks that are currently unsolvable by SLMs alone.
Submission Length: Regular submission (no more than 12 pages of main content)
Previous TMLR Submission Url: https://openreview.net/forum?id=glJWmumPpA&referrer=%5BAuthor%20Console%5D(%2Fgroup%3Fid%3DTMLR%2FAuthors%23your-submissions)
Changes Since Last Submission: We fixed formatting issues and updated the font to match the TMLR style.
Assigned Action Editor: ~Mark_Coates1
Submission Number: 5668
Loading