Tool use is provably more scalable than in-weight memory for Large Language Models
Keywords: Large language models, in-weight memory, tool use, pretraining, finetuning
TL;DR: We theoretically and empirically demonstrate the benefits of in-tool learning over in-weight memory for large language models.
Abstract: Tool-augmented language models, equipped with retrieval, memory, or external APIs, are reshaping AI. Yet,
their theoretical advantages remain underexplored. In this paper, we address this question by demonstrating
the benefits of *in-tool learning* (external retrieval) over*in-weight learning* (memorization) for factual recall. We show that the number of facts a model can memorize solely in its weights is fundamentally limited by its parameter count.
In contrast, we prove that tool-use enables unbounded factual recall via a simple and efficient circuit construction.
These results are validated in controlled experiments, where tool-using models consistently outperform memorizing ones.
We further show that for pretrained large language models, teaching tool-use and general rules is more effective than finetuning facts into memory. Our work provides both a theoretical and empirical foundation, establishing why tool-augmented workflows are not just practical, but provably more scalable. The code is available at https://github.com/ambroiseodt/itl.
Submission Number: 81
Loading