Trust & safety teams are drowning in the same triage pattern: a message lands, a moderator skims it, checks the policy wiki, takes one of five actions. That sequence is a tool-using loop. The hard part isn’t getting a model to classify text — it’s wiring the platform API, the policy doc, and the action surface into a clean, auditable loop a human can actually trust.
Most moderation prototypes start with a model and a hardcoded prompt: “flag if hateful, spam, or NSFW.” That works for a week, until policy changes. Then someone has to redeploy. Then you discover the bot never looked at the linked URL, just the message text. Then it flags a benign meme as a slur because the prompt drifted.
The shape that survives contact with real moderators: the agent reads your written policy on every decision, fetches and inspects any embedded URLs, classifies, and either acts (low stakes) or escalates (high stakes) — with a citation back to the policy clause in every log entry.
wmcp.sh wires this in: /integration/discord and /integration/slack for the platform, /integration/notion for the policy doc, and the generic URL fetcher for any link in a message. wmcp.sh is not affiliated with Discord, Slack, or Notion.
1. Event source. A Discord bot or Slack app subscribes to messages in moderated channels and forwards the message ID into a queue. Each event gets its own bounded agent run.
2. Tool gateway (wmcp.sh). The agent boots with platform tools (Discord or Slack), a Notion search for policy, and a generic URL fetcher for any links in the message.
3. Reasoning loop. The agent fetches the full message, expands any URLs (page text + OpenGraph), searches the policy doc for relevant clauses, classifies, and either acts (e.g. add reaction, hide) or files an item in the human review queue.
4. Audit. Every decision is logged with policy clause, confidence, and action taken. Reviewers can override and the override feeds back into prompt tuning.
| Capability | System | How wmcp.sh wires it |
|---|---|---|
| Read channel messages | Discord | ✅ /integration/discord |
| Read channel messages | Slack | ✅ /integration/slack |
| Search policy doc | Notion | ✅ /integration/notion |
| Inspect linked URL / image | Any URL | ✅ Generic /api/v1/tools?url=... — text + OG metadata |
| Soft action (react / hide) | Discord / Slack | ✅ Scoped to low-stakes methods only |
| Escalate to human queue | Linear / your queue | ✅ OpenAPI adapter via /integration/openapi |
Python sketch. Receives a message ID; emits a classification + action, always citing the relevant policy clause.
import os, httpx
from anthropic import Anthropic
client = Anthropic()
WMCP = "https://wmcp.sh"
def tools_for(url):
return httpx.get(f"{WMCP}/api/v1/tools", params={"url": url}).json()["tools"]
tools = (
tools_for("https://discord.com/api")
+ tools_for("https://www.notion.so/acme-policy")
+ tools_for("about:fetch")
)
msg_id = os.environ["MESSAGE_ID"]
resp = client.messages.create(
model="claude-sonnet-4-5",
max_tokens=1024,
tools=tools,
messages=[{"role": "user",
"content": f"Message {msg_id}. Fetch full content, expand any URLs, search the "
"policy doc for relevant clauses, classify, and propose one action: "
"none / soft-warn / hide / escalate. Cite the policy clause."}],
)
print(resp.content)
/api/v1/tools?url=... fetcher; multimodal classification needs a vision model.Custom platform adapter + hosted MCP at mcp.yourbrand.com + verified badge. Starter $499 one-time · Managed Retainer $999/mo · Enterprise $4,999+/mo.