Where the Human Goes — Sally Kellaway

In March 2026, I stopped asking AI for answers and started running it like a workforce. The shift came from an unlikely seat — a place of curiosity instead of a company-wide mandate. It wasn't a bigger model or a cleverer prompt. It was a flip in posture, from tool to team: I stopped typing into a chatbot and started staffing problems out instead. And I set one rule first — whatever the agents produce, I will never say: "Sorry, this was sent by Claude, I don't know why it said that."

TASK — turn a quarter of raw data into an exec-ready brief

Same task — first as the worker, then as the director. Watch what changes, and what doesn’t.

You’re the human in a chatbot session. Each click takes your next turn — watch how often the work bounces back to you.

Round-trips: 0 Your role: —

Empty so far — take the first turn below.

What you’re producing →

The brief owner: you — every step done by hand

Human — you own the ends Agent — the workforce Artifact

The inflection point

The four triggers

A chatbot stops scaling the moment the grindwork outgrows a single conversation. I actively judge four aspects to know when I've crossed that line — scale, complexity, the gap between the task and my own expertise, and sheer repetitiveness. Any one can tip me toward an agent team; usually it's a combination, the way a kitchen tips from one cook to a brigade when the orders, the courses, and the covers all climb at once. None of that, though, changes who owns the result.

Scale

one conversationa standing operation

Complexity

single stepmany interlocking parts

Expertise gap

squarely my domainwell outside it

Repetitiveness

one-offsame grind, again

Chatbot zone

Still a single-conversation job.

tip point

chatbot zoneagent-team zone

— one conversation, one owner —

You can delegate the work. You never delegate the ownership.

That's the engine under everything that follows. Over-automation is dangerous not because the model is bad, but because it tempts you to disown your own output — to treat the thing with your name on it as something that merely happened near you, rather than something you made. For me, accountability is an important professional value.

Delegating into the blind spot

People tend to delegate hardest where they can least judge the result — precisely where blind delegation does the most damage. The fix isn't to delegate less. It's to decompose: a trusted human at the input, a comprehending human at the output, the agent working only the verified middle.

To use a specific project example: an executive-level report on system reliability. Real data checked by people I trust goes in — not whatever the model can scrape or guess. Nothing comes out until a human who understands the material has reviewed it, in a voice I've coached the system to write in — despite having no data-visualization background of my own. The middle is "verified" only because both ends are.

Human · inputTrusted data in

Agent · middleThe verified middle

Human · outputComprehending review

Trusted data in

Real data, checked by people I trust — not whatever the model can scrape or guess.

Source check

✓ Verified source✓ Owner reviewed

The verified middle

The agent re-presents the trusted data, flags its own discrepancies and questions, and writes in my coached voice.

Agent worklog

⚑ Flagged — needs human⚑ Confirm figure

Comprehending review out

Nothing ships until a human who understands the material has reviewed it.

System Reliability — Executive Summary

Voice-coached✓ Reviewed — human

Underneath that sits a quieter move. Before any of it, I do the legwork on establishing the tone and format and gathering feedback early on myself — not from distrust, but to build my own sense of the terrain, enough to ground the agents and steer them when they drift.

The verified middle gives you the shape of delegation. How hard you watch it is a separate dial — a tension, not a switch.

Oversight & accountability

How hard you watch

How hard you watch the middle

The gate to step further out of the loop isn't the agent getting better. It's the downstream humans understanding the artifact better. That reframes oversight entirely. Oversight is misinterpretation risk weighed against output quality — a communication problem AI amplifies, not a model problem. Humans misread human-made work too; the agent just produces more of it, faster — and with prettier formatting, it looks more believable.

Which creates an honest pain point. When most outputs from a single agentic pipeline look nearly identical, it's hard to find the needles in the haystack. So the first thing I coach the system to do is write in my voice, and flag its own discrepancies and questions — not to step out of the loop, but to make staying in it bearable. That coaching makes review easier. It does not, by itself, earn the agent more rope.

Review queue — find the discrepancy

Five agent outputs, near-identical — one hides a discrepancy. With flagging off, you have to open each to check. Go find it.

It's hard to find the needles in the haystack.

Self-flagging turns a hunt into a glance — it makes staying in the loop bearable. It does not, by itself, let me step out of the loop.

Coach, then trust

The rope comes later, and it's a different stage. Agents are how I beat human-scale limits, yet I hold the output to a human-scale accountability standard — and reconciling those is a deliberate sequence: coach until review is easy, then, only where the stakes allow, extend trust and step back. Coach, then trust. Where the platform caps how much I can customize or coach an agent, I can't trust it as far, so I stay in heavier review — the tooling quietly sets the ceiling.

Let's explore a personal project: the pipeline I run for my own job-search materials is higher stakes than any work report, not lower. It represents me. It bends my career. It helped me understand exactly which work I'll never hand over.

Work report — re-presenting trusted data2 human gates

re-presentflagsource-check exec

Gate 1 — trusted data in (human)

Reliability metrics come from the engineering data owners, human-verified at source — not whatever the model can scrape or guess. The agent only ever re-presents data someone else already owns, which is exactly why this side stays lighter-touch.

Gate 2 — my comprehending review (human)

In the verified middle the agent re-presents the data into an exec-digestible format, flags its own discrepancies (a metric that moved unexpectedly), and source-checks every figure against the telemetry — in a voice I've coached it on. I read, correct, and sign off. Coach-then-trust: as review gets easy and the stakes allow, I step back; the platform ceiling caps how far.

Executive brief · reliability

✓ Source-checkedVoice-coached⚑ 1 flagged

Job-search materials (personal — my real pipeline)4 human gates

research tailor CV draft letter submit

Gate 1 — proceed after research (strategic)

The agent fetches the role + company, structures it into a research profile, and auto-flags things like Portuguese-first requirements or contractor-vs-CLT. Then it stops. I decide whether the role is worth pursuing — only I know my pipeline priorities.artifact · company research profile + summary

Gate 2 — CV review (accuracy)

It reframes my master CV to the role and exports a tailored doc. I verify every metric and ownership claim before it goes further — it auto-flags anything it couldn't source and any EPM-vs-PM distinction risk.artifact · tailored CV (.docx + .pdf)

Gate 3 — cover-letter review (voice)

It drafts from the tailored CV + my tone profile — never inventing experience. I read it for voice and authenticity, because that can't be delegated: it has to sound like a writer who was stoked on representing me.artifact · cover letter (.docx)

Gate 4 — submission (control)

The agent does not submit. The irreversible action — and the final accuracy, strategy, and timing call — is mine alone.artifact · none — I press send

The thing that represents me has more human gates — not fewer.

What I'll never automate

The line I won't cross isn't (AI) competence. It's care, it's representation. Three things stay human.

Stake

Work that has to carry conviction a model can't fake.

I am experimenting with allowing my agents to write my cover letters. The output was accurate but indifferent — factually me, and hollow.

A writer who was stoked on representing me.

Identity

My voice, my photos — the texture of being me.

An agent can imitate them, but it can't be me. (Only genuine photos and voice ship — never an AI likeness.)

Growth

Work where the grind is the point.

My Executive MBA assignments are mine to grind through — I'm there to learn, and automating it eats the growth.

An agent can't care, can't be me, or you, and can't become us through the work. And because the line of what these models can do keeps moving as the technology and the ethics around it move, the rules are going to stay provisional.

The rules stay provisional

A working philosophy that can't change isn't a philosophy — it's a superstition. So this framework adapts as the technology does, and as the realities of ethics, alignment, and data privacy evolve underneath it.

A rule for every case. → Hypotheses, not commandments.

The human moves up

As agents get better at execution, the value of human work doesn't shrink — it expands. We get more space and time to think deeply and design experiments. The human isn't automated away. The human moves up — more capable, more creative, more themselves.

Before

grind

direct

After

grind

think · direct

Time to direct digitally-created projects, as opposed to having to grind through the work itself.

This page was made this way

So I made this one the way I work — grounded by a human interview (with me), engineered by a dedicated agent team, and gated by four required checkpoints where I reviewed the output, fed back, and changed it. Not rubber-stamped — changed. Tap a checkpoint.

The agent team

Orchestrator / EPM

Narrative Lead

Voice & Authenticity Guardian

UX / Interaction Designer

Visual / Brand Designer

Front-End Engineer

Editor / Dual-Audience Critic

Privacy & Fact Reviewer

The work here was delegated. The ownership never was. That's the whole reason I will never say:

"Sorry, this was sent by Claude, I don't know why it said that."