Keywords: Model Context Protocol (MCP), LLM, Tool Discovery, Token Usage Analysis, Latency Measurement
Abstract: The Model Context Protocol (MCP) aims to standardize the integration of Large Language Models (LLMs) with external tools, yet existing research primarily evaluates functional capabilities while treating the underlying protocol as an opaque black box. This oversight obscures critical inefficiencies in token flows and latency distributed across MCP’s decoupled Host-Client-Server architecture. In this paper, we introduce ProMCP, an end-to-end profiling and instrumentation framework that decomposes the MCP workflow into a six-stage communication pipeline, enabling granular attribution of computational costs. We evaluate widely varying deployment topologies—from air-gapped local models to commercial off-the-shelf (OTS) clients—across 20 servers and 169 tools from MCP-Bench and MCP-Universe. Our analysis reveals a distinct inversion in performance bottlenecks: topologies with customized clients devote 56–72\% of total tokens and 60–67\% of latency to planning and schema injection, whereas OTS clients concentrate over 85\% of latency in final answer synthesis. Crucially, actual tool execution constitutes a negligible fraction of the total cost across all configurations. These findings establish a quantitative baseline for protocol overhead and demonstrate that future optimization must target schema orchestration and transport efficiency rather than tool execution speed.
The code is available at: https://anonymous.4open.science/r/mcp-F16B.
Paper Type: Long
Research Area: Resources and Evaluation
Research Area Keywords: Model Context Protocol, LLM Profiling, Benchmarking
Contribution Types: Model analysis & interpretability, Publicly available software and/or pre-trained models
Languages Studied: English
Submission Number: 8633
Loading