Reverse the Walk: Long-Tail-Aware Sampling for Generalizable Tool Learning

Reverse the Walk: Long-Tail-Aware Sampling for Generalizable Tool Learning

ACL ARR 2026 January Submission466 Authors

23 Dec 2025 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Tool Generalization, Tool Augmentation, Generalizable Tool Learning

Abstract: Recent tool-augmented and -generated methods exhibit a heavy reliance on dominant-usage tools. This results in a severe long-tail imbalance, where a small fraction of frequently used APIs overshadows a vast number of rarely used ones. This imbalance critically limits a model's ability to generalize beyond memorized, high-frequency tool calls and severely hampers compositional reasoning across tools. To address this fundamental challenge, we introduce reverse walk, a novel data synthesis framework designed to democratize tool learning. We first construct an API dependency graph that captures the semantic relationships between tools based on their task descriptions. Departing from conventional forward generation, we perform a reverse random walk starting specifically from tail nodes (low-frequency tools) to generate multi-hop tool trajectories that naturally incorporate rare but meaningful tool combinations. This strategy compels models to learn the underlying semantic and logical dependencies rather than merely overfitting to frequency-based co-occurrence patterns. Our experimental results on challenging agentic benchmarks like $\tau2$ and BFCL demonstrate that training on our generated data significantly improves both rare tool generalization and compositional reasoning effectiveness without degrading performance on frequent tools. Our findings highlight the critical importance of long-tail-aware data design for building robust and generalizable tool-using language models.

Paper Type: Long

Research Area: AI/LLM Agents

Research Area Keywords: Tool Generalization, Tool Augmentation, Generalizable Tool Learning

Contribution Types: Data resources

Languages Studied: English

Submission Number: 466

Loading