MemTR: Enhancing Tool-Calling Reliability via Uncertainty-Triggered FFN-Space Retracing

ACL ARR 2026 January Submission2693 Authors

03 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: tool calling, function calling, large language models, decoding-time intervention, constrained decoding, uncertainty estimation, entropy-based triggering, retrieval-augmented decoding, activation editing, plug-and-play methods
Abstract: Tool calling requires Large Language Models (LLMs) to generate structured decisions including tool names and schema-constrained arguments, where small decoding mistakes can cause hard failures. Existing methods either rely on costly tool-use training data or only constrain syntax, leaving tool selection and argument value errors largely unsolved. We analyze tool-calling failures through a Where–When lens: (Where) failures correlate with persistent uncertainty in late Transformer layers, (When) uncertainty concentrates on content-bearing tokens (tool names and argument values) rather than schema tokens. Based on this, and motivated by evidence that transformer Feed-Forward Network (FFN) function as key–value style memories that store and retrieve factual or associative mappings, we propose Memory Space Tool Retracing (MemTR), a weight-free decoding-time method that retrieves relevant tool evidence from the tool library and mixes it into the FFN output at the uncertain layer, treating FFNs as key–value memories. Across BFCL, ACEBench, and APIBank on Llama, Qwen and xLAM, MemTR yields a 2\%–9\% relative reduction in tool-calling failure rate with 1\%–2\% runtime overhead, without fine-tuning. Our implementation code will be released as open-source software after publication.
Paper Type: Long
Research Area: AI/LLM Agents
Research Area Keywords: Language Modeling,Dialogue and Interactive Systems
Contribution Types: Model analysis & interpretability, NLP engineering experiment, Approaches to low-resource settings, Publicly available software and/or pre-trained models
Languages Studied: English
Submission Number: 2693
Loading