Representing Agentic Tools in Knowledge Graphs for Structure-Aware Tool Discovery Under Tool Overload
Track: Research
Confirmation: I have read and agree with the workshop's policy on behalf of myself and my co-authors.
Student Paper: No
Generative AI Compliance And Declaration: I confirm that this submission complies with GenAIK-NORA Generative AI Policy. The manuscript includes a mandatory declaration section explicitly stating whether Generative AI was used and, if applicable, describing the extent of its usage.
Keywords: knowledge graphs, tool discovery, MCP, agentic AI, ontology engineering, tool use, LLM agents
TL;DR: Representing Agentic Tools in Knowledge Graphs for Structure-Aware Tool Discovery Under Tool Overload
Abstract: Large language model (LLM) agents increasingly rely on external tools, yet most tool ecosystems still expose
those tools as unstructured textual descriptions or JSON schemas. As tool inventories grow, this becomes a
retrieval problem where the agent must surface a small relevant set under context and tool-budget constraints.
We study knowledge-graph-based tool representation for agentic systems through a lightweight ontology for
Model Context Protocol (MCP) tools. The ontology models tools, servers, capabilities, and parameters, and treats
required versus optional inputs as first-class relations. Using real MCP tool schemas extracted from publicly
available servers, we build an RDF knowledge graph. We instantiate this Knowledge Graph on MCP-Atlas, a
benchmark for tool-use competency built around real MCP servers, and compare a KG-augmented discovery
workflow against a text-only baseline across multiple frontier models and two exposure regimes: the benchmark’s
smaller task-level tool menus and an overload setting with an all-tools registry of approximately 269 tools over
258 executed tasks. The early empirical results show specific and actionable insights. In smaller curated tool
settings, direct text-only exposure remains stronger for all tested models. However, under overload where the
unstructured baseline is constrained by a maximum tool budget, KG-based filtering improves GPT-5 from 0.478
to 0.542 mean coverage. For Claude 4.6 Sonnet in the all-tools condition, the KG retains roughly 89% of the text
baseline’s coverage while reducing the candidate set from about 270 tools to 4.6 tools on average. Qualitative error
analysis indicates that the KG helps primarily by reducing tool overload, name ambiguity, and backend confusion,
while its main weakness is incomplete recall caused by missing or imperfect capability assignments. The central
conclusion validates the value of knowledge graphs as a structure-aware compression layer for large, noisy tool
registries, and opens a larger research question on best approaches to represent tools-knowledge-graphs together
with strong textual tool descriptions.
Submission Number: 19
Loading