Orchestration Patterns for AI Agents: From Solo to Swarm

From Solo to Swarm

A single AI agent can write a function, fix a bug, or generate a test. But real engineering work is not a single task. It is a chain of tasks with dependencies, reviews, rollbacks, and parallel workstreams. Once you move beyond one-shot prompts, you need orchestration: a way to coordinate multiple agents working together.

There are five core orchestration patterns. Each solves a different problem, and the best systems combine several of them. Understanding these patterns is the difference between running one agent at a time and running a coordinated engineering team of agents.

1. Sequential Chain (A then B then C)

The simplest pattern. One agent finishes, the next one starts. Output from step A becomes input for step B.

How it works:

Agent A writes the implementation.
Agent B writes tests for that implementation.
Agent C reviews both and produces a summary.

When to use it:

Tasks with strict dependencies, where each step needs the previous step's output.
Simple workflows where order matters more than speed.
When you need predictability and easy debugging.

Limitation: It is slow. If Agent A takes 10 minutes, Agent B waits 10 minutes before it can start. Total time is the sum of all steps.

2. Parallel Fan-Out (Multiple Agents Simultaneously)

Multiple agents work on independent tasks at the same time. A coordinator dispatches work and merges the results when all agents finish.

How it works:

Coordinator breaks the work into independent units.
Agent A implements feature X. Agent B implements feature Y. Agent C writes documentation.
When all finish, the coordinator merges results and resolves conflicts.

When to use it:

Tasks that are independent, with no shared state or dependencies between them.
Large workloads that benefit from parallelism.
When total time matters more than individual task complexity.

Limitation: Merge conflicts. If Agent A and Agent B both modify the same file, the coordinator must resolve the conflict. This requires good task decomposition upfront.

Parallel fan-out cuts total time from the sum of all tasks to the duration of the longest single task.

3. Supervisor / Dispatch (Orchestrator Delegates to Specialists)

A central orchestrator agent reads the task, decides which specialist agent should handle it, and delegates. The orchestrator does not do the work itself; it routes.

How it works:

The orchestrator receives a ticket or task.
It reads the skill target, analyses the requirements, and selects the right specialist (backend agent, test agent, infrastructure agent, etc.).
The specialist executes the task and reports back.
The orchestrator decides the next step: another specialist, a review, or completion.

When to use it:

Complex projects with many domains (backend, frontend, testing, infrastructure).
When different tasks require different expertise or context.
When you want a single point of coordination.

Limitation: The orchestrator becomes a bottleneck if it is too slow or makes poor routing decisions. The quality of the system depends on how well the orchestrator understands the task.

4. Multi-Model (Different Models for Different Jobs)

Not all AI models are equal. Some are better at implementation, some at research, some at review. The multi-model pattern uses the right model for each job.

How it works:

Opus handles complex implementation: architecture decisions, multi-file changes, nuanced requirements.
Gemini handles research: scanning documentation, summarising large codebases, finding references.
Codex handles review: checking code quality, catching bugs, verifying test coverage.
A lightweight model handles simple tasks: formatting, boilerplate, repetitive changes.

When to use it:

When cost matters, because using an expensive model for everything is wasteful.
When different tasks have different quality requirements.
When you want to optimise for speed on simple tasks and depth on complex ones.

Limitation: Handoff between models requires clear context transfer. Each model has different capabilities and failure modes, so the orchestrator needs to know which model fits which task.

5. Feedback Loop (Build, Test, Fix, Re-test)

The agent builds something, tests it, analyses the failure, fixes the code, and tests again. This loop continues until the task passes or a retry limit is reached.

How it works:

Agent writes the implementation.
Agent runs the test suite.
If tests fail, the agent reads the error output, diagnoses the issue, and fixes the code.
Agent re-runs the tests.
Loop continues until all tests pass or the maximum iteration count is reached.

When to use it:

Any task with automated validation: unit tests, type checking, linting, integration tests.
When you want the agent to self-correct without human intervention.
Autonomous iteration on well-defined problems.

Limitation: Without a retry limit, the agent can loop forever on a problem it cannot solve. Always set a maximum iteration count and escalate to a human when the limit is reached.

The Reviewer Does Not Equal the Implementer

This is the single most important principle in agent orchestration, and it applies regardless of which pattern you use:

Never let the same model instance implement AND evaluate its own work.

When the same agent writes the code and reviews the code, it suffers from confirmation bias. It believes its own output is correct because it generated it. This is the same reason human teams separate code authors from code reviewers.

In practice, this means:

The implementing agent and the reviewing agent should be different instances, different models, or at minimum different sessions with fresh context.
The reviewer should have access to the spec and acceptance criteria, not just the code diff.
The reviewer should be able to reject work and send it back for revision.

This is not overhead. It is the quality gate that prevents subtle bugs, missed requirements, and slowly degrading code quality from compounding across tasks.

Gas Town: Level 9 in Practice

To see what advanced orchestration looks like at scale, look at Gas Town, Steve Yegge's system for running 20-30 parallel Claude Code instances as a coordinated engineering team.

Gas Town uses a hierarchy inspired by Mad Max: Fury Road:

The Mayor: the top-level orchestrator. Reads the project plan, breaks work into tasks, assigns agents, and manages the overall workflow.
Polecats: the specialist worker agents. Each Polecat works on a specific task in its own Git worktree, isolated from other agents.
The Witness: the quality reviewer. Reviews completed work from Polecats, runs tests, checks against specs, and approves or rejects.
The Deacon: handles documentation, changelog updates, and project state management.
The Refinery: the merge coordinator. Takes approved work from multiple Polecats and merges it into the main branch, resolving conflicts.

Key infrastructure decisions in Gas Town:

Git worktrees: each Polecat works in its own worktree, so agents never conflict on file locks or staging areas. This is what makes true parallelism possible.
Beads: a lightweight issue tracker that the Mayor uses to assign and track work across agents. Each bead is a task with status, assignee, and dependencies.
Separation of concerns: the Mayor never writes code. Polecats never review their own work. The Witness never implements. Each role has a single responsibility.

Gas Town represents frontier territory in AI agent orchestration. It combines parallel fan-out, supervisor/dispatch, multi-model routing, and feedback loops into a single coherent system. This is what Level 9 looks like: a fully autonomous engineering team where human involvement is strategic, not operational.

Choosing the Right Pattern

Most real systems combine multiple patterns:

A supervisor dispatches tasks to specialists (pattern 3).
Independent tasks run in parallel (pattern 2).
Each specialist uses a feedback loop to self-correct (pattern 5).
Different models handle different task types (pattern 4).
Sequential chains handle tasks with strict dependencies (pattern 1).

Start with the simplest pattern that solves your problem. Add complexity only when you hit a real limitation. A sequential chain that works reliably is better than a parallel swarm that fails unpredictably.

The pattern you choose matters less than the principle you enforce: the reviewer is never the implementer, and every agent operates within defined boundaries.