← All posts

Zero-Latency MCP Proxying: Optimizing Local LLM Tool Calls Under 20ms

Why standard remote Model Context Protocol setups introduce hundreds of milliseconds of lag—and how to bypass it with Cloudflare Workers.

2026-05-28

The Lag Problem: Why Local LLMs Hesitate

If you have plugged a remote Model Context Protocol (MCP) server into your local workspace—whether you are using Cursor, Claude Desktop, or Codex—you have probably noticed a subtle, frustrating pause.

You write a prompt, the agent decides it needs to fetch a tool, and then... nothing happens for half a second. Finally, the tool executes and the agent continues.

This hesitation isn't the LLM reasoning. It is network latency.

When your local agent invokes a remote MCP tool, the request typically follows a long, winding path:

  1. Your desktop agent parses the LLM's tool call.
  2. It sends an HTTP or SSE request to a remote server.
  3. The remote server initiates a subprocess, maps the payload, and makes a downstream API request.
  4. The response travels back through the remote gateway, down to your local desktop, and is finally fed back into the LLM context.

In a traditional server architecture, this entire loop can easily take 400ms to 1.5 seconds. For an interactive chat experience, this latency ruins the natural flow of the conversation. If your agent needs to chain three or four tool calls sequentially (for example, searching a store, selecting a variant, and initiating a checkout), you are looking at several seconds of pure network overhead.

To make agentic tools feel instant, we have to optimize the network path. We need to move the tool resolution code as close to the client as possible and eliminate heavy subprocess execution.


The V8 Isolate Advantage: Sub-20ms Execution

The secret to eliminating tool latency is moving from traditional virtual machines or containers to edge-native V8 isolates, such as Cloudflare Workers.

Traditional server setups (like Docker containers running on AWS or GCP) suffer from cold starts and slow routing. When a tool call hits a container, the runtime environment has to spin up, allocate memory, and execute a heavy Python or Node.js runtime process. Even on hot instances, the sheer volume of middleware and framework code adds tens of milliseconds of overhead.

In contrast, Cloudflare Workers run on V8 isolates. They share a single process, utilizing sandboxed memory contexts that instantiate in under a millisecond. By deploying tool schemas and adapters directly onto Cloudflare's global edge network (spanning over 300 POPs), tool requests are resolved at the server geographically closest to your computer.

Furthermore, by replacing runtime library execution with pre-compiled, stateless mappings, we eliminate the need for heavy local subprocesses.

Here is the exact latency breakdown:

Phase Traditional Hosted Container Edge-native V8 Isolate (wmcp.sh)
Connection Setup (TCP/TLS) 80ms - 150ms 10ms - 20ms (Edge termination)
Runtime Cold Start 500ms - 2s < 1ms
Tool Schema Resolution 50ms - 120ms < 3ms
Downstream API call 200ms - 800ms 30ms - 200ms (Edge-peered routes)
Total In-Flight Delay 830ms - 3.0s 41ms - 223ms

By running your tool gateway at the edge, you bypass the public internet congestion. The connection terminates at a nearby edge node, which then leverages high-speed backbone routing to communicate with downstream APIs.


Building a Stateless Edge Tool Resolver

To demonstrate how to achieve this, let's build a minimal, high-speed tool resolver that runs on Cloudflare Workers. This stateless worker accepts an OpenAPI specification URL, parses it, and maps it to a standard Model Context Protocol tool schema in under 20ms.

import { Hono } from "hono";

const app = new Hono();

// High-speed, stateless OpenAPI-to-MCP converter running at the edge
app.get("/api/v1/mcp-resolve", async (c) => {
  const targetUrl = c.req.query("url");
  if (!targetUrl) {
    return c.json({ error: "Missing target URL parameter" }, 400);
  }

  const startTime = Date.now();

  try {
    // 1. Fetch the OpenAPI spec from the source (leveraging CF global caching)
    const response = await fetch(targetUrl, {
      headers: { "User-Agent": "wmcp-edge-compiler/1.0" },
      cf: { cacheEverything: true, cacheTtl: 3600 } // Cache spec for 1 hour at edge
    });

    if (!response.ok) {
      return c.json({ error: `Failed to fetch spec: ${response.statusText}` }, 500);
    }

    const spec: any = await response.json();
    
    // 2. Dynamically compile the OpenAPI paths into MCP tool definitions
    const tools = Object.keys(spec.paths || {}).flatMap((path) => {
      const methods = spec.paths[path];
      return Object.keys(methods).map((method) => {
        const op = methods[method];
        return {
          name: op.operationId || `${method}_${path.replace(/[^a-zA-Z0-9]/g, "_")}`,
          description: op.summary || op.description || `Invoke ${method.toUpperCase()} on ${path}`,
          inputSchema: {
            type: "object",
            properties: op.parameters?.reduce((acc: any, param: any) => {
              acc[param.name] = {
                type: param.schema?.type || "string",
                description: param.description || ""
              };
              return acc;
            }, {}) || {}
          }
        };
      });
    });

    const duration = Date.now() - startTime;

    return c.json({
      tools,
      meta: {
        compiled_in_ms: duration,
        source: targetUrl
      }
    });
  } catch (err: any) {
    return c.json({ error: err.message }, 500);
  }
});

export default app;

This simple handler takes any standard OpenAPI definition and compiles it into an agent-readable format on the fly. Because Cloudflare caches the downstream spec locally at the edge POP, subsequent calls avoid the downstream network hop entirely, resolving the conversion in less than 3 milliseconds.


How wmcp.sh Powers Zero-Latency Agents

This is the core architectural philosophy behind wmcp.sh.

Instead of forcing developers to download heavy, bloated packages or run slow local subprocesses, we build our gateway as a lightweight, stateless router running on Cloudflare Workers.

When your agent uses wmcp.sh to call tool endpoints, it communicates with V8 isolates optimized for high-frequency routing. By combining edge-native execution with an intelligent global caching layer (our public registry now tracks 495 active, verified storefront and API endpoints), we ensure that your AI agent receives validated tools in under 50ms.

No delays. No container cold-starts. Just pure, instant agent actions.

Want this implemented on your stack? Custom adapter + hosted MCP + verified directory listing. From $499 one-time setup.
See /managed → Submit (free)