June 30, 2026·6 min read

Your AI Has a Brain. It Needs a Body.

On this page

Here's the thing nobody tells you when you start building with AI: the model is the easy part.

You can hand the smartest model on earth your credit card, your calendar, and a link, and ask it to buy a concert ticket. It'll reason through it perfectly. It'll even say "ordering it now." And then... nothing happens. It's a brain in a jar. Brilliant, and completely stuck.

The piece that's missing finally has a name, and in 2026 it's the thing everyone building serious agents is obsessed with: the harness.

It started with prompts

We got here in three short eras.

01Prompt Engineering

~4k tokens

Windows were tiny, so you fought for every word. Craft the one perfect instruction.

02Context Engineering

~128k tokens

Windows got huge. Now you feed the model everything it needs to actually understand the situation.

03Harness Engineering

context + constraints + feedback

Context isn't enough. You give the model a body: tools, rules, and a loop to act in.

First we obsessed over prompts, because context windows were tiny (around 4k tokens) and every word was a fight. Then the windows blew up to 128k and beyond, and the game became context: feed the model everything it needs to actually understand the situation. And now we've hit the wall where context still isn't enough, which is where harness engineering begins.

The term itself got its name from Mitchell Hashimoto (yeah, the Terraform guy) earlier this year, and the rest of the field landed on the same map: Martin Fowler, Anthropic, all of them. The formula they keep repeating is simple: Agent = Model + Harness. The model reasons. The harness does everything else.

So what's an agent, really?

Strip away the hype and an agent is almost embarrassingly simple. It's a loop wrapped around an LLM. The model makes a decision, something acts on it, the world answers back, and that answer goes to the model. Again. And again. Until the goal is done, with barely a human in the middle.

An agent is just this loop, running on its own: decide, act, observe, repeat.

That loop is what makes it "agentic": it acts on its own, makes its own calls, chases a goal autonomously. Cool. But here's the problem.

Why context isn't enough

An LLM runs on probability. Same input, slightly different output, every single time. That's wonderful for writing and brainstorming, and absolutely terrifying for a system that's supposed to spend your money or touch production.

Context, all that history and knowledge you stuff into the window, only gives the model understanding. It does not give it a way to act. It can't change anything, can't enforce a rule, can't reach out and touch the world.

The whole idea in one line

The model is the brain: reasoning and judgment. The harness is the body: hands, reflexes, memory, senses, the skeleton that holds it all up. A brain with no body is just a very expensive chatbot.

Let me show you what I mean

Back to that concert ticket. A $30 seat, simple ask. Except the website has a glitch that jumps the price to $300 right at checkout. Watch what happens as we add one layer at a time.

Context onlyAll talk, no action

SetupThe AI has your card, your calendar, and the ticket link.
ActionIt reads everything, checks your schedule, and says: "Found a seat, ordering it now."
ResultNothing happens. It's a chatbot. It has no hands to actually click checkout.

+ HarnessAccount drained

SetupNow it gets a browser tool. It opens the link, fills the form, and orders.
ActionThe session expires mid-loop. It retries, refills, clicks checkout, but the price already glitched to $300.
ResultIt buys the $300 ticket and drains your account. Hands, but no judgment.

+ GuardrailsSafe, but stuck

SetupYou hardcode one rule: max spend $50.
ActionSame glitch, same retry, it clicks checkout at $300.
GuardrailThe click is blocked instantly: "price exceeds $50, transaction cancelled."
ResultYour money is safe, but the task is dead.

+ WorkflowDone, and safe

SetupYou wrap the agent in a real plan: verify price against budget; if blocked, loop back, find a cheaper seat, ask a human if needed.
ActionIt hits $300 at checkout and doesn't give up. It goes back to seat selection and picks a $30 seat one row over.
ResultTicket booked. Money 100% safe. The goal, finally reached.

Look at that progression. Context alone is all talk. Add a harness and it finally has hands, but no judgment, so it drains your account. Add guardrails and it's safe, but it just gives up. Add a real workflow and it routes around the failure and actually gets you the ticket.

Four layers, four completely different endings.

Four words that keep it straight

Once you've seen that, the vocabulary clicks into place.

Contextthe goal

What you're trying to do, and everything the model needs to know about it.

Harnessthe power

The hands and senses to actually act in the world.

Guardrailsthe safety

The hard limits that stop the agent before it does damage.

Workflowthe strategy

The plan that routes around failure and reaches the goal anyway.

Context is the goal, harness is the power, guardrails are the safety, workflow is the strategy. Mix them up and you build the wrong thing: a chatbot when you needed an agent, or a loose cannon when you needed a seatbelt.

The anatomy of a harness

So if the harness is the body, what are the organs? When I build one, it comes down to six parts.

the hands

Tool Registry

Lets the agent actually do things: open a link, click a button, run a command. Without it, the model is just talking.

the notebook

State & Memory

External storage that tracks progress, so the agent survives a restart and doesn't forget what it already did.

the sandbox

Execution Runtime

An isolated space to run tools and code, so a bad step can't wreck the real machine or do something destructive.

the critic

Verification Engine

Proves the work really happened before moving on. Guardrails prevent; this one reacts and checks for hallucination.

the panic button

Human-in-the-Loop

An approval gate for the high-impact, irreversible decisions. The safety net when judgment really matters.

the black box

Telemetry & Cost

Logs every prompt, tool call, and dollar spent, so you can debug what happened and kill it if it blows the budget.

Notice how few of those are "prompting." The hands, the notebook, the sandbox, the critic, the panic button, the black box, that's plumbing. That's engineering.

The uncomfortable truth

Here's the part that surprises people. A harness is not a long, clever prompt with a pile of rules in it. It's mostly code: real backend logic, schemas, runtimes, validation, wrapping the model in layers. In my own builds it lands around 80% engineering, 20% prompt and config, and that ratio is the whole point.

And this is the part people get backwards. Putting an AI agent in your system does not make it less code. If anything, it makes it more. The model is one small, fuzzy part sitting in the middle. Everything that keeps it safe, repeatable, and actually useful is still code you have to write. An agent doesn't shrink your codebase, it wraps a probabilistic core in a thick shell of very ordinary engineering. The AI doesn't replace the work. It moves the work into the harness.

~80%

of a harness is code, not prompt

65%

of failures trace to the harness, not the model

swappable

the model is a part you can replace

There's a line I keep coming back to: the harness, not the model, is the unit of engineering. Models come and go. A better one drops every few months and you swap it in. The harness is the part you actually own, the part that carries your judgment about how the work should be done.

It's also the part that rots if you let it. Every bad run tempts you to bolt on one more flag, one more rule, one more special case, until the whole thing is a haunted house. Good harnesses grow by subtraction: fewer moving parts, hard ceilings, ruthless about what earns its place.

The bridge

So that's harness engineering in one picture. It's the bridge between deterministic code and probabilistic AI. On one side, software that does exactly what you tell it, every time. On the other, a model that's brilliant and a little unpredictable. The harness is what lets you hand the unpredictable thing real work, by keeping the smart-but-fuzzy surface small and the boring-but-reliable surface large.

The model is the spark. The harness is everything that turns that spark into something you'd actually let near your bank account.

And honestly? Building the body is the fun part.

Author: Glenn Pray