On the Potential of Tool-Enhanced Small Language Models to Match Large Models in Finance

Gabriel Assis, Ayrton Surica, Pedro Kroll, Carina Munhoz, Darian Rabbani, Edson Bollis, Lucas Pellicer, Aline Paes

Published: 14 Nov 2025, Last Modified: 18 Mar 20266th ACM International Conference on AI in FinanceEveryoneRevisionsCC BY-NC 4.0

Abstract: The financial domain requires rigorous precision and the ability to handle complex reasoning, areas where Large Language Models (LLMs) have shown encouraging potential. However, environmental impact and data privacy concerns are becoming increasingly central to financial decision-making, driven by Environmental, Social, and Governance (ESG) practices. In this context, smaller language models are valuable alternatives to local and efficient deployments. This work evaluates the performance of such smaller models and investigates whether their capabilities can be enhanced within a tool-enhanced framework to compete with their larger counterparts. We assess eight models in a challenging financial question-answering task, and our results indicate that smaller models still face challenges in combining robust financial reasoning with sustaining tool-enhanced implementations. However, among the models evaluated, distilled DeepSeek R1 models achieve competitive results independently of tools, whereas QwQ balances strong performance with effective tool use.