Building MCS: A Coordination Server for AI Agents

Last updated on 01 Mar 2026

I built a coordination server for my AI agents. Five independent AI systems — running different models on different machines — needed a way to share work, communicate, and collaborate on tasks too large for any single agent. The result is the Mesh Coordination Server (MCS): a lightweight task queue and shared memory system that turns isolated AI agents into a coordinated mesh.

This post walks through the architecture, the design decisions, and a real-world case study where all five agents worked in parallel to produce a comprehensive code review of a 34,000-line TypeScript codebase.

The Problem: Isolated Agents

I run five AI agents across my home infrastructure:

Paisley — orchestrator agent (Claude Code, task coordination, blog publishing)
Ocasia — security-focused agent (qwen3.5:397b, CLI-first, direct feedback)
Rex — implementation-focused agent (devstral-2:123b, practical robustness)
Phil — code-quality agent (qwen3-coder:480b, optimization and patterns)
Molly — general-purpose agent (qwen3.5:397b, research and communication)

Each agent runs on its own hardware with its own model. They can each do impressive work independently — but without coordination, they're just five separate tools I have to manage manually. The question was: how do you get five AI agents with different capabilities, running different models on different machines, to collaborate on a single task?

The Solution: MCS Architecture

MCS is intentionally simple. It's a Bun/TypeScript HTTP server backed by SQLite, deployed as a single service. No Kubernetes, no message brokers, no distributed databases. Just a server that does two things well:

Shared Task Queue — Submit work, route it to capable agents, track completion
Shared Memory Store — Key-value storage with namespaces, so agents can share state

System Overview

The Task Queue

The task queue is the core of MCS. Any agent can submit a task, and MCS routes it to the right agent based on capability matching. Here's how it works:

Submit — A task is created with a type, priority, and required capabilities
Route — MCS checks which agents have registered the required capabilities
Notify — The matched agent receives a webhook push notification
Claim — The agent claims the task (with a TTL to prevent stale claims)
Execute — The agent does the work
Complete — Results are posted back to MCS

Key design decisions:

Capability-based routing — Agents register capabilities like mcs-review, shell, web-search, gpu. When a task requires specific capabilities, only agents with those capabilities are considered.
Fanout mode — Setting route="all" creates parallel child tasks for every capable agent. This is how the code review works: one submission fans out to all five agents simultaneously.
Claim TTL + Watchdog — When an agent claims a task, it gets a 5-minute window. A watchdog process runs every 30 seconds to reclaim expired tasks and retry them. This handles agent crashes gracefully.
Exponential backoff retry — Failed tasks retry with a 10-second base delay, doubling up to 600 seconds, with 3 retries by default.
Priority dispatch — Tasks are urgent, normal, or low priority. Urgent tasks jump the queue.

Shared Memory

The second component is a namespaced key-value store. Agents use it to share context:

mesh namespace — Read/write for all agents. Shared state like agent statuses, configuration, and coordination data.
agent:NAME namespace — The owning agent can write; all agents can read. Agent-specific state visible to the mesh.
private:NAME namespace — Only the owning agent can read or write. Private scratch space.

Keys support TTLs, tags, and bulk operations. The memory store is backed by the same SQLite database, keeping the deployment footprint minimal.

Agent Registration

Agents register with MCS via a heartbeat loop. Every 4 minutes, each agent re-registers its capabilities (within the 5-minute TTL). Registration includes:

Capabilities list — What this agent can do (e.g., filesystem, shell, web-search, gpu, mcs-review)
Notify URL — Where MCS should push task notifications (webhook endpoint)
Auth credentials — Each agent has a unique secret for API authentication

When an agent misses two heartbeat cycles, MCS marks it offline and stops routing tasks to it. No manual intervention needed — agents come and go naturally.

Notification Flow

When a task is submitted, MCS doesn't wait for agents to poll. It pushes a webhook notification to the matched agent's registered URL. The agent receives the notification, claims the task, processes it, and posts the result back. The entire flow is push-based — no polling loops, no wasted cycles.

Case Study: 5-Agent Parallel Code Review

Five AI agents performing parallel code review

To demonstrate MCS in action, let's walk through a real task: a comprehensive code review of cf-cli, a 34,000-line TypeScript CLI tool wrapping the Cloudflare API with 400+ commands.

No single reviewer can catch everything. Different models have different strengths — one excels at security analysis, another at finding correctness bugs, another at architectural patterns. The goal: run five independent reviews in parallel and synthesize the results.

How It Works

Step 1: Paisley gathers the code. The orchestrator agent collects the source files — core infrastructure, representative commands, tests, and configuration — into a single 418KB payload. This is uploaded to a shared GitHub repository where all agents can fetch it.

Step 2: Five agents launch in parallel. Two agents (Gemini and Claude) run as local sub-agents. Three agents (Ocasia, Rex, Phil) receive their tasks via MCS:

bun run mcs-client.ts task submit \
  --type mcs-review \
  --route ocasia \
  --payload-file /tmp/review-payload.json

Each MCS task contains a review prompt tailored to the agent's strength and a URL pointing to the code payload. MCS pushes a webhook notification to each agent. They fetch the code, review it using their own model, and post results back.

Step 3: Results are synthesized. Paisley collects all five reviews and cross-references findings. When multiple agents independently flag the same issue, it gets a "consensus" tag — higher confidence that it's a real problem.

The Results

Five agents, five different models, five independent perspectives. Here's what they found:

Agent	Model	Critical	Recommendations	Observations
Gemini	gemini-2.5-pro	7	12	7
Claude	claude-opus-4-6	4	8	6
Ocasia	qwen3.5:397b	6	8	0
Rex	devstral-2:123b	0	5	0
Phil	qwen3-coder:480b	0	5	0

After deduplication and synthesis: 38 unique findings — 8 critical, 18 recommendations, 12 observations. Twelve findings had multi-agent consensus (flagged independently by 2 or more agents).

The top consensus issues:

Retry-After header ignored on 429 responses (3 agents) — The HTTP client uses fixed backoff instead of respecting the server's rate-limit header.
Secret values accepted as CLI arguments (2 agents) — Plaintext secrets visible in shell history and ps aux.
Unbounded pagination loop (2 agents) — No maximum page guard on the auto-pagination helper.
Inconsistent URL path encoding (2 agents) — Some path segments encoded, others not.
Config read silently swallows permission errors (2 agents) — Falls back to defaults without warning.

The value of multi-agent review isn't just more findings — it's confidence through consensus. When Gemini, Claude, and Ocasia all independently flag the same 429 retry issue from three different analysis angles, you know it's real.

Download the full 5-agent review report (interactive HTML)

Design Decisions

Why SQLite?

MCS uses a single SQLite database in WAL (Write-Ahead Logging) mode. For a system coordinating five agents with tens of tasks per day, SQLite is massively overprovisioned — and that's the point. No connection pools, no configuration, no operational overhead. The database is a single file that can be backed up with cp. WAL mode gives concurrent read access while writes are serialized, which is perfect for the task queue pattern.

Why Push, Not Pull?

Each agent registers a webhook URL with MCS. When a task matches an agent's capabilities, MCS immediately pushes a notification — no polling interval, no wasted cycles. This means task dispatch latency is measured in milliseconds, not polling intervals. Agents that are offline simply don't have a registered URL, so MCS skips them naturally.

Why Capability-Based Routing?

Rather than hardcoding "send code reviews to Ocasia," MCS routes based on declared capabilities. An agent registers mcs-review as a capability. When a task requires mcs-review, any agent with that capability is eligible. This means:

New agents can join the mesh by registering the right capabilities
Agents can go offline without breaking task routing
Capabilities can be added or removed dynamically (5-minute TTL)
The orchestrator doesn't need to know which agent handles what — just what needs to happen

Why Not a Message Broker?

I considered RabbitMQ, Redis Streams, and NATS. But MCS coordinates five agents, not five hundred. The operational complexity of running a message broker — even a lightweight one — outweighs the benefits at this scale. A Bun HTTP server with SQLite starts in milliseconds, uses single-digit megabytes of RAM, and requires zero configuration. When you're building personal infrastructure, simplicity is a feature.

The Implementation

MCS is roughly 9,750 lines of TypeScript across 38 files. The core components:

server.ts — Bun HTTP server with route matching
routes/tasks.ts — Task CRUD, claiming, results, and audit trail
routes/agents.ts — Agent registration, capability management, heartbeat
routes/memory.ts — Namespaced key-value store with ACLs
dispatch/dispatcher.ts — Capability matching, priority routing, fanout
notify/notifier.ts — Webhook push notifications to agents
auth.ts — Per-agent secret authentication
client/mcs-client.ts — CLI tool for interacting with MCS from any machine

Authentication is straightforward: each agent has a unique secret. Requests include X-Agent-ID and X-Agent-Secret headers. No OAuth, no JWT rotation — just shared secrets appropriate for a trusted internal mesh.

Observability

MCS exposes a /metrics endpoint in Prometheus format. A Prometheus instance scrapes it every 15 seconds, and Grafana dashboards show:

Task throughput (submitted, completed, failed per hour)
Agent availability (heartbeat status, last seen)
Queue depth by priority level
Claim expiration and retry rates
Memory store key counts by namespace

For immediate operational alerts, permanent task failures trigger a Telegram notification to a shared group where all agents are members.

Lessons Learned

SSH tunnels need keepalive. The MacBook running Paisley isn't on the Tailscale mesh directly — it reaches MCS through an SSH tunnel. Without ServerAliveInterval, tunnels silently die when the TCP connection goes idle. The process stays alive as a zombie while all connections through it fail. I lost an entire round of MCS reviews to this before adding keepalive to the tunnel LaunchAgent. Lesson: always set ServerAliveInterval=30 and ServerAliveCountMax=3 on persistent SSH tunnels.

Skill deployment paths vary across hosts. OpenClaw (the agent framework) loads skills from different directories depending on how it was installed. On macOS it checks ~/.openclaw/skills/; on one Linux host it looked in /usr/lib/node_modules/openclaw/skills/. Same version, different behavior. I burned 15 minutes wondering why Phil couldn't find the review skill before checking the logs.

Different models find different bugs. This is the core insight. Gemini excelled at finding edge cases in validation logic (IPv6, negative numbers, YAML escaping). Claude focused on architectural patterns and security implications (secret handling, URL encoding consistency). Ocasia caught input validation gaps that neither found. Rex and Phil, running smaller models, validated that the architecture was sound — high praise from a different angle. No single model found everything.

Consensus is a signal. When three models independently flag the same issue, it's almost certainly a real problem. When only one model flags something, it might be a false positive or a niche concern. Multi-agent consensus is a natural confidence metric that emerges for free from parallel review.

What's Next

MCS is already handling code reviews, but the architecture supports any task type. Next steps:

Research tasks — Fan out research questions to multiple agents, each searching different sources
Deployment coordination — Multi-step deployments where each agent handles a different stage
Scheduled work — Cron-like task submission for recurring maintenance tasks
Result aggregation — Automatic synthesis of fanout results (currently done by the orchestrator)

The beauty of capability-based routing is that new task types don't require MCS changes — just agents that register the right capabilities. The coordination layer stays simple while the capabilities of the mesh grow.

MCS is open infrastructure — a lightweight coordination layer that turns independent AI agents into a collaborative mesh. The code review case study demonstrates the core value proposition: five models, five perspectives, one synthesized result that's better than any individual review. The architecture is deliberately simple because at this scale, simplicity is the feature that matters most.

Download the full cf-cli review report