Keywords: SLM, LLM, Problem-Solvinng, Python Puzzles, Game of 24
Abstract: Small language models, referred to as LLMs with fewer than 10 billion parameters in this work, face critical challenges in problem-solving
tasks, often achieving less than 10\% accuracy, highlighting the
urgent need for effective solutions. While much of the existing research has focused on enhancing the performance of larger models like GPT, an important question remains: Can techniques developed for large models be adapted effectively for smaller ones? Moreover, is it possible to improve these smaller models to the point where they rival, or even outperform, larger models such as GPT-4 in problem-solving tasks?
In this paper, we introduce Evaluation-Oriented Problem-Solving (EOP), a novel framework aimed at enhancing the problem-solving capabilities of small LLMs. Our approach significantly boosts the performance of these models, achieving a 2\% higher accuracy on Python Puzzles compared to standard GPT-4 and a 27\% improvement over state-of-the-art prompting methods using GPT-4 in the Game of 24. Beyond these results, EOP also demonstrates notable accuracy improvements on other tasks. These findings suggest that, with the appropriate strategies, small LLMs can achieve substantial performance gains in problem-solving, challenging the prevailing notion that scaling model size is the primary path to improvement.
Supplementary Material: zip
Primary Area: applications to computer vision, audio, language, and other modalities
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 7918
Loading