Abstract: The financial domain requires rigorous precision and the ability to handle complex reasoning, areas where Large Language Models (LLMs) have shown encouraging potential. However, environmental impact and data privacy concerns are becoming increasingly central to financial decision-making, driven by Environmental, Social, and Governance (ESG) practices. In this context, smaller language models are valuable alternatives to local and efficient deployments. This work evaluates the performance of such smaller models and investigates whether their capabilities can be enhanced within a tool-enhanced framework to compete with their larger counterparts. We assess eight models in a challenging financial question-answering task, and our results indicate that smaller models still face challenges in combining robust financial reasoning with sustaining tool-enhanced implementations. However, among the models evaluated, distilled DeepSeek R1 models achieve competitive results independently of tools, whereas QwQ balances strong performance with effective tool use.
Loading