The Supervisor Pattern: Building Production-Safe Multi-Agent Swarms
The first multi-agent system most engineers ship is a free-for-all - four agents, all allowed to call each other, all allowed to call tools. It demos beautifully. In production it either loops forever, contradicts itself, or burns through your token budget by week two. The fix is the supervisor pattern, and once you internalize it you will reach for it on almost every agent project.
TL;DR - The Supervisor Pattern in One Line
One supervisor agent owns the plan and delegates. Worker agents are specialists that do one thing and return. No peer-to-peer communication. No shared-state mutations without supervisor approval. This is how you get determinism, observability, and safety in a multi-agent system.
Why the Free-for-All Pattern Fails
When every agent can call every other agent, three predictable failure modes emerge:
- Infinite conversation loops. Agent A asks Agent B for clarification. B asks A for clarification. The token bill climbs.
- Contradictory state mutations. Two agents update the same field with different values. The last writer wins, but only by accident.
- Untraceable decisions. When something goes wrong, you cannot tell which agent issued the bad action or why.
The supervisor pattern solves all three by design - not by adding more guardrails on top of a chaotic system.
The Pattern: Supervisor + Specialist Workers
The pattern mirrors how good human teams work. The project manager (supervisor) reads the brief, decides which specialist (worker) to call, waits for the result, then decides the next move. Specialists do not freelance - they execute a narrow task and hand back.
Three properties this gives you that a peer-to-peer swarm does not:
- Determinism. Control flow is one agent's job, not emergent behavior.
- Observability. Every decision has a clear author - the supervisor.
- Safety. Write operations pass through a single choke point where policy checks live.
A Real E-Commerce Example
Here is a support system I built using this pattern in late 2024. The supervisor receives a customer ticket and picks between three workers:
- Order Lookup Worker - read-only access to the orders database. Queries by order ID or customer email and returns structured order state.
- Policy Worker - read-only access to the refund and returns policy. Answers whether a specific refund is allowed under current policy.
- Action Worker - the only agent with write access. Executes refunds, address updates, or order cancellations - but only after explicit supervisor approval and a policy check.
The supervisor loops: plan, delegate, observe, re-plan. It is the only agent that can call the Action Worker. The two read-only workers cannot call anyone - they just return data.
Implementing It in LangGraph
from langgraph.graph import StateGraph, END
from typing import TypedDict, Literal
class State(TypedDict):
ticket: str
plan: list
observations: list
proposed_action: dict | None
action_approved: bool
def supervisor(state: State) -> dict:
next_step = llm_plan_next(state)
# next_step is one of: "lookup", "policy", "action", "done"
return {"plan": state["plan"] + [next_step]}
def route(state: State) -> Literal["lookup", "policy", "action", "done"]:
return state["plan"][-1]
graph = StateGraph(State)
graph.add_node("supervisor", supervisor)
graph.add_node("lookup", order_lookup_worker)
graph.add_node("policy", policy_worker)
graph.add_node("action", action_worker)
graph.set_entry_point("supervisor")
graph.add_conditional_edges("supervisor", route, {
"lookup": "lookup",
"policy": "policy",
"action": "action",
"done": END
})
# Workers always return to the supervisor
graph.add_edge("lookup", "supervisor")
graph.add_edge("policy", "supervisor")
graph.add_edge("action", "supervisor")
app = graph.compile(checkpointer=postgres_saver)
Two non-obvious choices in this implementation:
- Workers always return to the supervisor. No worker ever routes to another worker. This is the rule that prevents emergent loops.
- The supervisor is a thin LLM call - just a planner that picks the next step from a fixed enum. Heavy reasoning lives in the workers, not in the supervisor.
Production Guardrails That Matter
- Bound the loop. Set a max iteration count on the supervisor - 8 is usually plenty. Runaway loops are the number-one cost leak.
- Human-in-the-loop on the Action Worker. Any write operation above a risk threshold pauses for human approval. (See my HITL tutorial for the full pattern.)
- Schema the worker returns. Every worker returns a typed structure - never raw LLM text. Parse-or-reject.
- Log the supervisor's reasoning. Capture the why of every delegation. When something goes wrong three weeks in, this is the only thing that will save you.
- Per-worker timeouts and retries. A stuck worker should not stall the whole graph. Timeout, retry once, then escalate.
- Cost guard at the supervisor level. Track tokens per ticket. Hard-stop if a single ticket exceeds 10x median spend.
Real Production Numbers
The same e-commerce system, 90 days post-launch:
The 4-Rule Multi-Agent Architecture Framework
The TJ Multi-Agent Architecture Rules
- One supervisor, many specialists. Never let two agents share planning responsibility.
- Workers return to the supervisor, never to each other. This single rule prevents most loop and contradiction failures.
- Write actions live in one worker only. Concentrate the dangerous capability so policy lives in one place.
- The supervisor's role is routing, not reasoning. Heavy thinking lives in the workers; the supervisor just decides who acts next.
When NOT to Use the Supervisor Pattern
If your task is genuinely collaborative and creative - say, a research crew brainstorming angles on a topic - a peer-to-peer or role-playing pattern (CrewAI style) can be worth the debuggability cost. The brainstorm is the point, and the lack of determinism is a feature.
But for anything that touches customer data, money, or external write actions, start with a supervisor. Always. The "but my use case is creative" argument almost always loses to "your customers expect deterministic outcomes."
5 Common Mistakes in Multi-Agent Systems
- Letting workers call other workers. The single most common cause of infinite-loop incidents.
- Putting write capability in multiple workers. Now policy enforcement has to live in N places. It will diverge.
- Skipping the iteration cap. One bad ticket can burn $50 in tokens before someone notices.
- Logging only the final state. When something is weird, you need the supervisor's reasoning at every step.
- Using a heavyweight model as the supervisor. Routing decisions are fast and cheap on GPT-4o-mini. Reserve the big model for the workers that need it.
Frequently Asked Questions
Can the supervisor pattern handle parallel worker execution?
Yes. LangGraph supports parallel branches via fan-out edges. The supervisor delegates to multiple workers simultaneously and waits for all to return before re-planning. This is useful when the order lookup and policy lookup can run independently.
How is this different from CrewAI's hierarchical process?
CrewAI's hierarchical process is a similar idea - one manager agent delegates to workers. The difference is in control: LangGraph gives you explicit edge logic and typed state. CrewAI abstracts both away. For production write actions, the LangGraph version is easier to debug and audit.
What model should I use for the supervisor?
A small, fast model (GPT-4o-mini, Claude 3.5 Haiku, Gemini 1.5 Flash) is usually right. The supervisor is making routing decisions from a fixed enum, not generating prose. Reserve the heavy models for workers that need to reason or write.
How do I add a new worker after the system is in production?
Add the new node, add an edge from the supervisor to the new node and back, extend the routing enum, and add a few-shot example to the supervisor prompt that demonstrates when to use it. Ship behind a feature flag and roll to a small percentage of traffic first.
Does the supervisor pattern scale to 20+ workers?
It scales, but past 8-10 workers you should consider grouping workers under sub-supervisors (a hierarchical pattern). Otherwise the supervisor's prompt grows unwieldy and routing accuracy drops.
Conclusion
The supervisor pattern is the boring, durable architecture that ships and stays shipped. Every multi-agent system I have built in production over the last 18 months has used some variant of it. Free-for-all swarms make better demos; supervisor-worker systems make better businesses.
If you want a second opinion on the architecture of your specific multi-agent project, happy to whiteboard it with you in a free 30-minute call.
Designing a Multi-Agent System?
Free 30-minute scoping call. I will help you sketch the supervisor and worker boundaries on a whiteboard.
Book a Scoping Call