Representing Agentic Tools in Knowledge Graphs for Structure-Aware Tool Discovery Under Tool Overload

Isaiah Onando Mulang'; Johannes Thaller; Tushar Trivedi; Lars Heling; Felix Sasaki

Representing Agentic Tools in Knowledge Graphs for Structure-Aware Tool Discovery Under Tool Overload

Isaiah Onando Mulang', Johannes Thaller, Tushar Trivedi, Lars Heling, Felix Sasaki

Published: 10 Jun 2026, Last Modified: 10 Jun 2026IJCAI-ECAI 2026 Joint Workshop on GENAIK and NORAEveryoneRevisionsBibTeXCC BY 4.0

Track: Research

Confirmation: I have read and agree with the workshop's policy on behalf of myself and my co-authors.

Student Paper: No

Generative AI Compliance And Declaration: I confirm that this submission complies with GenAIK-NORA Generative AI Policy. The manuscript includes a mandatory declaration section explicitly stating whether Generative AI was used and, if applicable, describing the extent of its usage.

Keywords: knowledge graphs, tool discovery, MCP, agentic AI, ontology engineering, tool use, LLM agents

TL;DR: Representing Agentic Tools in Knowledge Graphs for Structure-Aware Tool Discovery Under Tool Overload

Abstract: Large language model (LLM) agents increasingly rely on external tools, yet most tool ecosystems still expose those tools as unstructured textual descriptions or JSON schemas. As tool inventories grow, this becomes a retrieval problem where the agent must surface a small relevant set under context and tool-budget constraints. We study knowledge-graph-based tool representation for agentic systems through a lightweight ontology for Model Context Protocol (MCP) tools. The ontology models tools, servers, capabilities, and parameters, and treats required versus optional inputs as first-class relations. Using real MCP tool schemas extracted from publicly available servers, we build an RDF knowledge graph. We instantiate this Knowledge Graph on MCP-Atlas, a benchmark for tool-use competency built around real MCP servers, and compare a KG-augmented discovery workflow against a text-only baseline across multiple frontier models and two exposure regimes: the benchmark’s smaller task-level tool menus and an overload setting with an all-tools registry of approximately 269 tools over 258 executed tasks. The early empirical results show specific and actionable insights. In smaller curated tool settings, direct text-only exposure remains stronger for all tested models. However, under overload where the unstructured baseline is constrained by a maximum tool budget, KG-based filtering improves GPT-5 from 0.478 to 0.542 mean coverage. For Claude 4.6 Sonnet in the all-tools condition, the KG retains roughly 89% of the text baseline’s coverage while reducing the candidate set from about 270 tools to 4.6 tools on average. Qualitative error analysis indicates that the KG helps primarily by reducing tool overload, name ambiguity, and backend confusion, while its main weakness is incomplete recall caused by missing or imperfect capability assignments. The central conclusion validates the value of knowledge graphs as a structure-aware compression layer for large, noisy tool registries, and opens a larger research question on best approaches to represent tools-knowledge-graphs together with strong textual tool descriptions.

Submission Number: 19

Loading