How to write a great AI agent brief
Most agents fail because the brief was vague, not because the model was weak. Here is how to write AI agent instructions as a real spec: inputs, outputs, guardrails, examples, and test cases.
By Andrew Pagulayan · Published
A team buys access to the most capable model on the market, wires it up to their inbox and their database, and asks it to "handle inbound leads." For a week it looks brilliant. Then it replies to a vendor invoice as if it were a sales lead, tags a real prospect as spam, and sends a follow up that promises a discount nobody authorized. The team concludes the model is not ready. The model was fine. The brief was three words long, and three words is not a spec.
This is the most common failure mode in applied AI right now, and it has almost nothing to do with model quality. The frontier models are extraordinary. What they are not is telepathic. When you hand an agent a vague instruction, it does not freeze and ask for clarification the way a careful employee would. It fills the gap with the most statistically plausible guess and proceeds with total confidence. The quality of your AI agent instructions is the single biggest lever you have over whether the thing works, and it is the lever almost nobody pulls hard enough.
The fix is not a longer prompt. It is a different kind of artifact. A great agent brief is not a paragraph of encouragement, it is a specification: precise about what goes in, precise about what comes out, explicit about what the agent must never do, anchored by worked examples, and checkable against test cases. Specs over vibes. This post walks through each part of that spec, with concrete patterns you can copy, the mistakes that quietly sink most briefs, and a checklist you can run before you ship anything to production.
Why vibes-based briefs fail in production
A vibes-based brief reads like a pep talk. "You are a helpful assistant that manages our support queue. Be friendly and resolve issues quickly." Every word is true and none of it is actionable. What counts as resolved? Which issues can the agent close on its own and which must a human see? What does friendly mean when a customer is furious and demands a refund the policy does not allow? The brief has handed all of those decisions to the model, and the model will make them differently every single time, because nothing in the instructions pins them down.
The reason this matters more for agents than for a one-shot chat is that agents act. A chatbot that gives a mediocre answer wastes a few seconds. An agent that misreads its mandate sends the email, updates the record, triggers the refund, and moves to the next task before anyone notices. The blast radius of a vague instruction scales with the agent's autonomy and its access. The more useful you make an agent, the more expensive its ambiguity becomes.
There is a second, subtler problem. Vague briefs are impossible to debug. When an agent does something wrong and the instruction was "handle inbound leads," you cannot point to the line it violated, because there is no line. You cannot tell whether the model misunderstood, the input was malformed, or the task was genuinely ambiguous. A real spec gives you a surface to debug against. When output drifts, you compare it to the stated contract and find exactly where the gap is. Vibes give you nothing to compare against, so every failure becomes a guessing game and every fix is a fresh roll of the dice.
An agent does not do what you want. It does what you wrote. The distance between those two things is where every production incident lives, and the brief is the only place you get to close it.
Start with inputs: define exactly what the agent receives
Most briefs jump straight to behavior and skip the part that determines whether any behavior is even possible: what the agent actually sees. An agent reasons only over the context it is handed. If you do not specify the inputs, you are leaving the most important variable in the system undefined, and the agent will perform brilliantly on the clean example you tested and fall apart on the messy reality of production data.
Write the input contract as if you were documenting a function. Name every field the agent receives, its type, whether it is required or optional, and what a realistic value looks like. For a lead-handling agent that might be the sender email, the raw message body, any prior thread history, the current pipeline stage, and the source the lead came from. Then, and this is the part people skip, describe the inputs that are malformed, missing, or adversarial. What does the agent do when the message body is empty? When the email is clearly automated? When a field that should always be present is null? Production is mostly edge cases wearing a trench coat, and the brief that only describes the happy path is a brief that has not met production yet.
- Enumerate the fields. List every piece of context the agent gets, by name, with a type and an example. If the agent needs data it is not being given, the brief is the place that gap becomes obvious, before you have shipped a broken agent.
- Mark required versus optional. State which fields will always be present and which may be absent. The agent's behavior when an optional field is missing should be a decision you made, not one it improvised.
- Describe the dirty data. Give examples of empty, truncated, duplicated, or obviously junk inputs, and say what the agent should do with each. "If the message body is empty or contains only a signature, skip and flag for review" is one sentence that prevents a whole category of confident nonsense.
- Name the source of truth. If two inputs can disagree, say which one wins. An agent told that the database record overrides the email body will never be paralyzed by a contradiction it has no rule for resolving.
Specify the output as a contract, not a hope
The output section is where vibes do the most damage, because "write a good summary" feels complete and is almost entirely empty. A good output spec defines the shape, the constraints, and the destination of what the agent produces, tightly enough that two different runs on the same input are recognizably the same kind of artifact.
Start with structure. If the agent's output feeds another system, demand a structured format and show the exact schema. A lead-qualification agent should not return a paragraph of prose, it should return a fixed set of fields: a qualification verdict from a closed list, a numeric score with a defined range, a one-line reason, and a recommended next action drawn from a fixed menu. The moment you constrain the output to a closed set of choices, you eliminate an entire universe of creative wrong answers. The model can no longer invent a fifth pipeline stage, because you only gave it four and told it to pick one.
Then specify the constraints that are easy to forget and expensive to miss. How long should the output be? What tone, with a concrete reference rather than an adjective? What must never appear, such as internal notes, competitor names, or promises about pricing? Where does the output go, and what is the agent allowed to actually do with it, draft versus send, propose versus commit? An output contract that says "return a draft reply, maximum 120 words, in the voice of our existing support replies, never mention refunds or discounts, and stop, do not send" is a spec a model can hit reliably. "Write a good reply" is a wish.
Guardrails: write down what the agent must never do
Capability and permission are different axes, and briefs constantly conflate them. The model is capable of sending email, issuing refunds, deleting records, and promising the moon. Whether it is permitted to is a decision you make and write down, not something the model should infer from the vibe of the task. Guardrails are the part of the brief that converts a powerful, eager system into a safe one, and they belong in the spec explicitly, phrased as hard rules rather than gentle suggestions.
The most useful guardrails are negative and specific. "Never send an email to an address that is not already in our CRM." "Never approve a refund over fifty dollars, escalate it instead." "Never modify a record whose status is closed-won." Notice that each of these names a concrete action and a concrete boundary. Compare that to "be careful with sensitive actions," which sounds responsible and constrains nothing. A model cannot act on a feeling of caution. It can absolutely act on a list of forbidden moves and a defined escalation path for anything that trips one.
Guardrails should also define the agent's behavior at the edge of its competence, because that is where the costly mistakes cluster. Tell the agent what to do when it is uncertain, when the input falls outside the cases the brief describes, or when two of its own rules seem to conflict. The default escalation rule, "when in doubt, stop and hand off to a human with a one line reason," is worth more than another paragraph of capability. The teams that run agents safely at scale almost always pair them with a clear handoff path, the kind of design we cover in our piece on building reliable AI automation, where the agent's job is to do the routine ninety percent and route the ambiguous ten percent to a person rather than guess.
- Phrase rules as prohibitions. "Never do X" is easier for both a model and a reviewer to check than "try to be appropriate about X." A guardrail you cannot verify is not a guardrail.
- Bound the irreversible actions. Anything that sends, pays, deletes, or publishes deserves an explicit limit and an explicit escalation rule. Reversible actions can run looser, irreversible ones cannot.
- Define the uncertainty behavior. A single sentence telling the agent to stop and escalate when it is unsure prevents more incidents than any amount of capability tuning.
- Resolve rule conflicts in advance. When two instructions can collide, say which wins, so the agent never has to invent a tiebreaker in the moment.
Examples are the highest-leverage part of the brief
If you have time for only one thing beyond the basics, spend it on worked examples. A model learns more from three well-chosen input-output pairs than from three paragraphs of abstract description, because an example collapses a dozen implicit decisions, tone, length, format, edge handling, into a single concrete artifact it can pattern-match against. Description tells the model what you mean. An example shows it, and showing wins.
Choose examples deliberately, not at random. Include at least one clean, typical case so the agent knows what normal looks like. Include at least one hard case that sits near a boundary, the lead that is borderline qualified, the message that could be read two ways, so the agent sees how you want ambiguity resolved. And include at least one negative example, an input where the right answer is to do nothing or to escalate, paired with the exact output you expect. The negative example is the one teams forget and the one that prevents the most damage, because it teaches restraint, and restraint is precisely what an eager model lacks by default.
Make the examples complete. Show the full input the agent would actually receive, not a tidied summary, and show the full output you expect down to the formatting. A half-specified example teaches half a lesson. When the input is messy, leave it messy in the example, because that is what the agent will face, and you want it to have already seen a version of the mess with the correct response attached. Over time your examples become a living record of every tricky case your agent has met, and updating the brief with a new example is often the fastest, safest way to fix a misbehavior, far safer than rewriting the instructions and hoping nothing else shifts.
Test cases turn a brief into something you can trust
Here is the discipline that separates a brief that survives contact with production from one that does not: you write the test cases before you ship, and you keep running them. A test case is just an input paired with the output you would accept, plus a note on what specifically you are checking. They are the difference between believing your agent works and knowing it does, and they are the only way to change a brief later without silently breaking something that used to work.
Build a small suite that covers the dimensions that matter. A few typical cases confirm the agent does the ordinary job well. A few boundary cases confirm it resolves ambiguity the way the brief says it should. A few adversarial cases, the empty input, the prompt injection buried in a message body, the request that tries to talk the agent past a guardrail, confirm the rules hold under pressure. And a few regression cases, captured from real failures as they happen, make sure a bug you fixed once stays fixed. Every incident in production should leave behind a test case so the same mistake cannot recur unnoticed.
The payoff compounds. With a test suite, editing the brief stops being scary. You change an instruction, rerun the cases, and see immediately whether you improved things or broke a guardrail you forgot about. Without one, every edit is a leap of faith and every model upgrade is a gamble, because you have no way to tell whether the new version still respects the boundaries the old one did. Teams serious about agents treat the brief plus its test suite as a single versioned artifact, where the test suite is the thing that makes it safe to let an agent run unattended at all.
A worked example: the lead-handling brief, rebuilt
Take the "handle inbound leads" disaster from the opening and rebuild it as a spec, so the pieces are concrete rather than abstract. The inputs are named: sender email, raw message body, thread history if any, current pipeline stage, and lead source, with an explicit rule that an empty or automated message body means skip and flag. The outputs are a closed contract: a verdict of qualified, not-qualified, or needs-human from a fixed list, a confidence score from zero to one hundred, a one-line reason, and a recommended action drawn from a fixed menu of four, with a hard cap that the agent drafts replies but never sends them.
The guardrails are written as prohibitions: never reply to an address not already in the CRM, never promise pricing or discounts, never advance a lead past the stage a human last set, and when the verdict is needs-human or confidence is below seventy, stop and escalate with the one-line reason attached. The examples carry the weight: a clean qualified lead with the exact structured output, a borderline lead that the brief resolves as needs-human, and a vendor invoice that the agent must recognize as not a lead at all and decline to process. The test suite locks it in: the invoice case, an empty-body case, an injection attempt in the message body, and the original incident that started all of this, each with the acceptable output spelled out.
Notice what changed. Not the model, not the access, not the integrations. The same agent with the same tools, handed a spec instead of a slogan, simply stops making the class of mistake that the slogan invited. That is the entire thesis. The brief is the product, and the model is the runtime that executes it. This is also why where you author and store the brief matters: a spec that lives next to the data the agent reads, the database it writes to, and the examples it learned from is far easier to keep honest than one buried in a prompt field nobody can find. It is the reason an AI-native workspace that keeps instructions, data, and runs in one place tends to produce more reliable agents than a pile of disconnected scripts, and it is the model behind how agents are authored in Team Brain, where the brief, the data, and the test history sit side by side rather than scattered across tools.
The pre-flight checklist for any agent brief
Before you let an agent touch anything real, run the brief against a short checklist. If you cannot answer every item, the brief is not finished, and the gaps you find on paper are the incidents you avoid in production. This is the cheapest quality gate in the entire stack, a few minutes of reading against a list versus a cleanup that can take days.
- Inputs named and typed. Every field the agent receives is listed, with its type, required-or-optional status, and an example, including at least one dirty input and the intended response to it.
- Output contracted. The shape is fixed, choices come from closed lists where possible, length and tone are pinned to a concrete reference, and the allowed action, draft versus send, is explicit.
- Guardrails as prohibitions. Every irreversible action has a stated limit, and there is a single clear rule for what the agent does when it is uncertain or its rules conflict.
- Examples that teach. At least one typical, one boundary, and one negative example, each complete from full input to full expected output.
- Test cases written. A small suite covering typical, boundary, adversarial, and regression cases exists and passes, and every future incident will add one.
- Escalation defined. There is a named place and a named person or queue the agent hands off to, and the agent knows exactly when to use it.
None of this requires a research team or a special model. It requires treating the brief as the engineering artifact it is, written with the same care you would give a function signature or an API contract, because that is precisely what it is. The agents that quietly run real work without drama are almost never the ones running the cleverest model. They are the ones running the clearest spec.
If you are setting out to build your first reliable agent, start there. Write the inputs, the outputs, the guardrails, the examples, and the tests before you write a single line of automation logic. When you are ready to put one into production against real data, create a workspace and build the agent where its brief, its data, and its run history live together. The spec is the work. The rest is execution.
Sources
- Anthropic, engineering guidance on building effective agents and writing clear tool and task instructions
- OpenAI, documentation and practical guidance on prompting, structured outputs, and agent design
- Stanford HAI, the AI Index Report on model capability, reliability, and real-world adoption
- McKinsey, research on deploying generative AI in the enterprise and governing automated work
- Harvard Business Review, on managing AI systems and the design of human oversight
- MIT Sloan Management Review, on AI reliability, evaluation, and operational risk