← All posts

The Cache-Back Flywheel: How One Chrome Install Makes Every Agent Faster

Why individual browser extensions are the secret weapon for building a global, server-side Model Context Protocol cache.

2026-05-27

The Crawler Wall: Why the Cloud Can't Shop

If you are building an autonomous AI shopping agent, you will quickly hit the "Crawler Wall."

You write a beautiful server-side parser that works perfectly on your local machine. It fetches product pages, reads their HTML, and maps checkout buttons. But the moment you deploy this code to a cloud environment (like a Cloudflare Worker, an AWS EC2 instance, or Fly.io) and scale to thousands of users, the storefronts stop responding.

You start seeing:

Merchants are engaged in a permanent security war against malicious scrapers, automated scalper bots, and credential stuffers. Because your cloud server is making requests from hosting provider IP ranges, it is immediately classified as a bot and shut down.

No amount of proxy rotation, header spoofing, or request throttling can bypass strict anti-bot shields indefinitely. The cloud cannot shop.

To build a shopping assistant that operates across the entire consumer web, we have to flip the architecture. We need to stop scraping the web from the server and start scraping directly from the user's browser.

But requiring every single user to install a browser extension is a friction nightmare. To solve this, we employ a hybrid, cooperative architecture: The Cache-Back Flywheel.


The Mechanics of a Cooperative MCP Cache

The Cache-Back Flywheel operates on a simple, high-leverage network effect. It splits users into two groups:

  1. Lightweight API Clients: The vast majority of users who connect their AI agents (like Claude Desktop or Cursor) to the hosted wmcp.sh endpoint. They install nothing locally.
  2. Extension-Assisted Contributors: A smaller group of developers and power users who install the lightweight wmcp.sh Chrome Extension.

When a standard API client asks wmcp.sh to resolve tools for a product URL (e.g., a Nike sneaker page), the hosted server checks its Shared Global Cache first:

[Agent Client] ──(1. Request Nike URL)──> [wmcp.sh Server] ──(2. Cache Hit! <100ms)──> [Clean Tool Schema]
                                                  ▲
                                                  │ (3. Cache Push via Extension)
                                        [Browser Extension User]

If the URL has been cached, the client receives the fully resolved checkout tool definitions in under 100ms without a single server-side scrape being fired.

But what if it's a new URL and the server gets blocked by Nike's Cloudflare shield?

Once cached, the next person in the world who asks wmcp.sh for that URL—even if they are running a headless agent on a remote server—gets the fully functioning tools instantly.

One person's browser session makes the API better for everyone else. The more people who use the extension, the wider our real-time tool coverage grows, building an elite, self-healing registry of the consumer web.


Implementing a DOM Metadata Extractor in a Chrome Extension

To make this flywheel work, the Chrome extension must be incredibly lightweight, secure, and fast. It should read the DOM, extract e-commerce schemas, and push the serialized JSON back to our gateway.

Here is a complete, production-grade Content Script (content.js) that runs inside the extension to dynamically extract Shopify and JSON-LD product tools and cache them:

/**
 * Content Script: content.js
 * Automatically executes on active storefront tabs to parse metadata 
 * and populate the global wmcp.sh cache.
 */

(function() {
  // 1. Double check if this is an e-commerce page by scanning standard schemas
  function extractProductMetadata() {
    let metadata = {
      title: document.title,
      url: window.location.href,
      adapter: "other",
      product: {},
      variants: []
    };

    // A. Attempt Shopify Storefront JSON extraction
    // Shopify stores expose a global window.Shopify or window.meta object
    if (window.Shopify && window.Shopify.shop) {
      metadata.adapter = "shopify";
      const productElement = document.querySelector('script[type="application/json"][id^="ProductJson"]');
      if (productElement) {
        try {
          const shopifyJson = JSON.parse(productElement.textContent);
          metadata.product = {
            title: shopifyJson.title,
            vendor: shopifyJson.vendor,
            type: shopifyJson.type
          };
          metadata.variants = shopifyJson.variants.map(v => ({
            id: String(v.id),
            title: v.title,
            price: (v.price / 100).toFixed(2),
            available: v.available
          }));
          return metadata;
        } catch (e) {
          console.warn("[wmcp-extension] Shopify JSON-script parse failed, falling back.");
        }
      }
    }

    // B. Fallback: Parse Schema.org JSON-LD blocks
    const jsonLdScripts = document.querySelectorAll('script[type="application/ld+json"]');
    for (const script of jsonLdScripts) {
      try {
        const json = JSON.parse(script.textContent);
        // Standardize schema objects which might be nested or direct
        const productObj = findSchemaType(json, "Product");
        if (productObj) {
          metadata.adapter = "jsonld";
          metadata.product = {
            title: productObj.name || document.title,
            description: productObj.description || ""
          };
          
          // Map offer details to variants
          const offers = productObj.offers;
          if (offers) {
            const offerList = Array.isArray(offers) ? offers : (offers.offers || [offers]);
            metadata.variants = offerList.map((o, idx) => ({
              id: o.sku || o.url || `variant-${idx}`,
              title: o.name || "Default Title",
              price: o.price,
              available: o.availability === "https://schema.org/InStock" || o.availability === "InStock"
            }));
          }
          return metadata;
        }
      } catch (e) {
        // Skip malformed JSON-LD scripts
      }
    }

    return metadata;
  }

  // Helper to recursively find specific Schema.org types (e.g. Product)
  function findSchemaType(obj, targetType) {
    if (!obj || typeof obj !== "object") return null;
    if (obj["@type"] === targetType) return obj;
    if (Array.isArray(obj)) {
      for (const item of obj) {
        const res = findSchemaType(item, targetType);
        if (res) return res;
      }
    }
    for (const key in obj) {
      const res = findSchemaType(obj[key], targetType);
      if (res) return res;
    }
    return null;
  }

  // 2. Perform extraction and POST back to the global cache
  const data = extractProductMetadata();
  if (data && data.variants.length > 0) {
    console.log("[wmcp-extension] Extracted storefront tools:", data);
    
    // Fire-and-forget payload cache push
    fetch("https://wmcp.sh/api/v1/cache/push", {
      method: "POST",
      headers: {
        "Content-Type": "application/json",
        // The user's extension authentication token
        "x-extension-token": "ext_5a9b1c3d2e4f0g8h" 
      },
      body: JSON.stringify(data)
    })
    .then(r => r.json())
    .then(res => {
      if (res.ok) {
        console.log("[wmcp-extension] Successfully pushed tools to global shared cache.");
      }
    })
    .catch(err => {
      console.warn("[wmcp-extension] Cache synchronization failed:", err);
    });
  }
})();

Shifting the Unit Economics of Web Scraping

By offloading the anti-bot challenge solving directly to the browser extension, the unit economics of operating an API like wmcp.sh shift dramatically.

Normally, routing scrapes through enterprise proxy networks (like Bright Data or Oxylabs) to bypass Cloudflare costs roughly $1.50 to $3.00 per gigabyte of bandwidth. If your scraping backend is downloading heavy HTML pages in loop, your margins are constantly being eroded by proxy fees.

With the Cache-Back Flywheel:

This network effect turns the traditional scraper scaling problem on its head: instead of becoming slower and more expensive as your user base grows, your API gets faster, cheaper, and more comprehensive.

Unit economics and smart routing are the structural foundations of sustainable AI products. Don't throw expensive proxies at robust firewalls—decentralize the extraction layer through a lightweight browser cache, and let cooperative network effects build the global index for you.

Want this implemented on your stack? Custom adapter + hosted MCP + verified directory listing. From $499 one-time setup.
See /managed → Submit (free)