Meerkat · No. 02a dissectionfield guide · vol. 02

00 · cover · anatomy of a working prompt

What a workingprompt looks like.

A dissection. Four ingredients in the prompt, six classes of model behind it, and a way to figure out which one fits your job.

01 · the four parts

role · task · context · rules

Every working prompt is four bones in a row.

part 01 · ROLE

Hire the model.

Who is writing.

The model collapses to whoever you say it is. A vague prompt asks 'an AI' to do the work — you get an AI's median answer. Hand it a job title with judgment and it stops auditioning.

before

“you are a helpful AI assistant”

after

You are a careful editor with a bias toward plain language.

why · Specifics inherit specifics. Generic role → generic output.

part 02 · TASK

Name the job.

The verb.

One sentence. One verb. If you cannot say it in plain English, the model cannot do it at all. The task line is where most prompts go to die — buried under hedges and qualifications.

before

“can you maybe help me with my email and like, make it better”

after

Rewrite the email below for clarity and a calmer tone.

why · A real verb (rewrite, summarize, extract, draft) gives the model an action. 'Help' gives it nothing.

part 03 · CONTEXT

Frame the room.

Audience and constraints.

Who is reading. How long. What format. The boring details that decide whether it lands. Without context, the model defaults to the average reader of the average blog — which is rarely your audience.

before

“make it good and professional”

after

Audience: a board chair. Length: 3–4 sentences. Format: plain text, no markdown.

why · 'Professional' is a vibe. 'Board chair, 3 sentences, no markdown' is a target.

part 04 · RULES

Draw the lines.

Guardrails and tells.

What to do, what not to do, what counts as failure. Rules are how you say no without arguing later. Be specific — 'no clichés' is hand-waving; 'no AI clichés (delve, leverage, in today's fast-paced)' is operable.

before

“don't be cringe”

after

Lead with the most important point. No hedges. Banned words: delve, leverage, robust, in today's fast-paced.

why · Models love rules. They especially love rules they can check themselves against.

02 · smell tests

Six tests. No mercy.

Read your prompt out loud. Ask each question below. If any answer feels squishy, you found the bone that's about to break in production.

Most failures are visible before you ever hit run. You just have to look.

test 01→
The role test
Could a junior on the team read the role line and act on it?
FIX Replace 'expert' with a job title plus a stance. 'Editor with a bias toward plain language' beats 'world-class writing expert' every time.
test 02→
The verb test
Is there exactly one action verb in the task line?
FIX If you have two, split the prompt into two prompts. If you have zero, you wrote a wish, not a task.
test 03→
The audience test
Could the model produce a wrong-but-plausible answer for the wrong audience?
FIX Name the reader. 'For a clinician' and 'for a 14-year-old' produce wildly different paragraphs from the same task.
test 04→
The format test
Did you say what shape the output should be?
FIX JSON schema, sentence count, max words, bullet vs prose. The model will guess if you don't tell it, and it will guess wrong on the high-stakes one.
test 05→
The failure test
Did you tell it what 'bad' looks like?
FIX Banned words, banned shapes, banned moves. 'No filler. No clichés. No "as an AI" disclaimers.' Three lines saves three rounds.
test 06→
The grandma test
Could you read the prompt to a smart human who's never seen the task and have them do it?
FIX If they'd ask follow-up questions, the model is silently making those answers up.

03 · failure modes · the rogues' gallery

you have written all of these

Five ways your prompt is quietly lying to you.

The Pep Talk

symptom

'You are a world-class, award-winning, deeply experienced…'

root cause

Confusing motivation with instruction. The model has no morale.

cure

Cut the adjectives. Keep the role. Add the stance.

The Kitchen Sink

symptom

Twelve rules, six examples, three personas, and a poem.

root cause

Hedging against failure by piling on. The signal drowns.

cure

Pick the four rules that matter. Move the rest to a fallback prompt.

The Vibe Brief

symptom

'Make it good. Professional. But also fun. Not too long.'

root cause

Using adjectives where measurements belong.

cure

Replace each adjective with a number, an example, or a banned alternative.

The Buried Lede

symptom

The actual task appears in paragraph four, after the backstory.

root cause

Writing the prompt like a memo instead of a spec.

cure

Task line first. Context second. Rules third. Input last.

The Open Door

symptom

No format, no length, no failure conditions.

root cause

Trusting the model to read your mind. It can't.

cure

Close the door. State the shape. State what 'wrong' looks like.

04 · model bestiary · pick the right animal

there are six. mostly.

The model is half the prompt. Choose carefully.

A perfect prompt aimed at the wrong model still loses. These are the six classes you actually need to know — what they cost, where they shine, and the moment they quietly start lying to you.

class 01

Fast & cheap

The line cook. Volume work, instant answers.

small · 4–8B class · sub-second

openai/gpt-5-nano
anthropic/claude-haiku-4
google/gemini-3-flash

latency: ≈ 300–800 ms
cost: fractions of a cent
ceiling: shallow reasoning

best for

Classification, routing, tagging
Extraction from clean inputs
Autocomplete, suggestions, rewrites in flight
First-pass filters before a slower model

not for

Anything requiring multi-step reasoning
Long-context analysis (>20k tokens)
Nuanced editorial judgment

If the task fits on a Post-it and the right answer is obvious to a human, this tier is the answer.

class 02

Balanced workhorse

The senior IC. Most production traffic lives here.

mid · general purpose

openai/gpt-5-mini
anthropic/claude-sonnet-4-5
google/gemini-3-pro

latency: ≈ 1–3 s
cost: cents per call
ceiling: broad competence, weakens on novel logic

best for

Drafting, summarizing, rewriting at quality
Multi-turn chat with memory
Most tool-using agents
Code that fits in one file

not for

Hard math, formal proofs, legal edge cases
Tight latency budgets (<500ms)
Bulk classification — overkill

Default here. Step down to fast when latency or cost is the constraint. Step up to heavy only when you can prove the workhorse failed.

class 03

Heavy frontier

The principal. Big context, hard problems, infrequent calls.

flagship · 200k–1M context

openai/gpt-5
anthropic/claude-opus-4.6
google/gemini-3-ultra

latency: ≈ 3–10 s
cost: tens of cents per call
ceiling: highest available quality

best for

Long-context synthesis (full books, codebases, transcripts)
High-stakes drafts where a human won't edit further
Final-pass review behind a workhorse
Complex tool orchestration

not for

Anything you'll run a thousand times a minute
Tasks a smaller model already nails — you're burning money

Use it as the editor, not the writer. Workhorse drafts, frontier polishes. Cost goes down, quality goes up.

class 04

Reasoning specialists

The mathematician. Thinks before it speaks. Literally.

o-series · thinking models

openai/o5
openai/o5-mini
anthropic/claude-opus-4.6 (extended thinking)

latency: ≈ 5–60 s
cost: high — pays for hidden reasoning tokens
ceiling: best-in-class on math, logic, planning

best for

Multi-step math, optimization, constraint solving
Code that requires planning, not pattern matching
Debugging across files
Strategic plans where the steps matter

not for

Conversation, summarization, creative writing
Anything time-sensitive
Cost-sensitive bulk work

Hand it problems where a human would pull out a notepad. Hand it chat and it'll over-engineer a hello.

class 05

Vision & multimodal

The pair of eyes. Reads screenshots, charts, PDFs, the wild.

multimodal · image + video in

openai/gpt-5 (vision)
anthropic/claude-sonnet-4-5
google/gemini-3-pro (vision)
google/gemini-3.1-flash-image-preview

latency: ≈ 2–6 s
cost: image tokens are cheaper than you think
ceiling: still misreads dense tables and tiny text

best for

Pulling structured data out of receipts, invoices, IDs
UI review, screenshot QA, accessibility audits
Chart and diagram interpretation
OCR for messy real-world documents

not for

Pixel-perfect tasks — measuring, color matching
Anything where a human would also need to zoom

If the answer is in the image, use vision. If the image is decoration, you're paying twice for nothing.

class 06

Embeddings

The librarian. Not a chatbot. A coordinate system.

vector · semantic similarity

openai/text-embedding-3-large
google/gemini-embedding-001
voyage/voyage-3

latency: ≈ 50–200 ms per batch
cost: near-zero
ceiling: search quality, not language

best for

Semantic search over your own data
Deduplication, clustering, classification at scale
Retrieval (RAG) before a generative model
Recommendations and 'more like this'

not for

Generating text — that's not what they do
Tasks where keywords matter more than meaning

If you said 'the model should know about my data,' the answer is almost always embeddings + retrieval, not a bigger context window.

Then layer them. A stack beats a single shot.

most production traffic is two models in a row

You almost never need one big model doing one big call. The teams shipping the best work pipe a cheap model into a smart one — or a smart one into a frontier one. Four patterns cover most jobs.

pattern 01

Router → workhorse

Fast (classify) → Balanced (do the job)

Triage cheap, execute well. Stops the workhorse from wasting cycles on stuff a 4B model could route.

pattern 02

Writer → editor

Balanced (draft) → Heavy (critique & polish)

Most of the words come from the cheaper model. The expensive model only touches the final 10%.

pattern 03

Retrieve → reason

Embeddings (find context) → Balanced or Heavy (answer)

Don't stuff the prompt with everything. Search first, then hand the model the five paragraphs that matter.

pattern 04

Plan → execute

Reasoning (decompose) → Balanced (do each step)

Reasoning models are great at planning, expensive at typing. Let them outline, let the workhorse fill in.

05 · the advisor · ask, get steered

Don't know which? Ask the meerkat.

Describe the job — the volume, the latency, the stakes, what the input looks like — and the advisor recommends a tier (and a stack pattern, if it earns it). Editorial, not enthusiastic. It will push back.

Recommends by tier and pattern, not brand.
Defaults to the workhorse. Has to be talked into the frontier.
Ends with one model to try first, and one alternative.

Not a sales pitch. A second opinion before you wire up the wrong thing.

live · the advisor

Tell it the job. Get the model.

editorial, not enthusiastic

Describe what you're actually trying to ship — volume, latency, stakes, what kind of input — and the advisor will steer you to a tier (and a stack, if it helps). Or pick one of these to start:

06 · enough theory

Now go write one that ships.

Meerkat asks the four questions for you and assembles the prompt in the format the model actually understands. Sixty seconds, no incantations.

Open the builder Read the field guide

What a workingprompt looks like.

Every working prompt is four bones in a row.

Hire the model.

Name the job.

Frame the room.

Draw the lines.

Six tests. No mercy.

The role test

The verb test

The audience test

The format test

The failure test

The grandma test

Five ways your prompt is quietly lying to you.

The Pep Talk

The Kitchen Sink

The Vibe Brief

The Buried Lede

The Open Door

The model is half the prompt. Choose carefully.

Then layer them. A stack beats a single shot.

Router → workhorse

Writer → editor

Retrieve → reason

Plan → execute

Don't know which? Ask the meerkat.

Now go write one that ships.