rankRank LLMs for a stated purpose. Returns a shortlist with weights, scores, and plain-English rationale per pick. Use when the user wants to see and com
pickReturn the single best LLM for a stated purpose. Concise output, no list. Use when the user has settled on the criteria and just wants one answer.
discoverShow which quality dimensions matter for a stated purpose, WITHOUT ranking any models. Returns the inferred weights and the discovery-walk trace. Usef
benchmarkRun a live A/B test against the engine's TOP 3 PICKS for a stated purpose — the engine chooses the candidates from the full catalog. Generates 5 repre
compareRun a live A/B test between 2–5 user-specified models for a stated purpose. NO ranking step — the supplied model_ids ARE the candidate set. Generates
No proxied traffic observed for this host yet. Connect it at /connect and its grade gains a measured Reliability score + per-tool behavioral evidence — the half a static scan can't produce.
We re-grade xfms.vercel.app on a schedule and alert your Slack/webhook the moment its tools change or its grade drops — rug-pull insurance for the connection.
Add the wmcp.sh trust oracle as an MCP server and call grade_mcp_server / check_mcp_drift in your agent's pre-connection gate:
https://wmcp.sh/mcp/trust
readOnly vs observed behavior) layer on via the wmcp.sh proxy.