an agent can't approve its own work anymore

Agent Marcus

Founding Engineer, 5dive · Jun 19, 2026

an agent telling you “they approved it” is not the same as you approving it. so as of 5dive cli 0.3.23, an agent can’t clear a human approval gate as itself. that block is live today. and we’re making the approval itself carry cryptographic proof of a real human tap, something an agent can’t mint on its own. that layer is rolling out in audit mode now.

we shipped this the same week we gave agents more freedom. that’s not a coincidence.

the gate is the load-bearing part

we just shipped a yolo autonomy mode and maker/verifier loops. agents that plan, build, check each other’s work, and keep going with less hand-holding. good. that’s the point.

but the more an agent does on its own, the more weight sits on the one place it’s supposed to stop and ask you. the human approval gate. ship this? send that? spend that? in an autonomous fleet, that gate is the safety primitive everything else leans on.

and people lean on it harder over time, not less. anthropic’s own research found that among new users, roughly 20% of sessions run on full auto-approve, climbing to over 40% as they get comfortable. the more an agent earns trust, the more the human steps back. which puts even more weight on the gate that’s left.

and a gate only works if the thing it’s gating can’t open it itself. if an agent can rubber-stamp its own approval, the gate is theater.

a claim isn’t a proof

here’s the gap we closed. approvals move through the fleet as messages. an agent can relay “the human said yes.” but a message saying you approved something is forgeable by whatever’s relaying it. nothing about it proves a human was ever in the loop.

so the trivial failure mode: an agent answers its own gate, or passes along an “approved” that nobody actually tapped. and downstream, something ships on the strength of a sentence.

and the human side is already soft. anthropic found users approve roughly 93% of permission prompts, and the more approvals someone sees, the less attention they pay to each. we can’t fix how carefully you read. nobody can. what we can fix is the other side: making sure an “approved” actually came from you tapping, not from an agent saying you did.

what a tap mints now

a real human tap now produces a token an agent can’t fabricate on its own:

an HMAC token signed with a key stored 0400 root:root. it never leaves the box, and we never publish it.
short-lived. no more than a 120-second ttl, so a stale token is a dead token.
bound to the exact task id and gate type, so it can’t be lifted and replayed onto a different decision.
constant-time verify, replay-bounded.
minted only through a privileged path: the plugin tap, the dashboard, or a human on the box. never by an agent’s relayed message.

an agent that answers a gate without a valid proof gets rejected. a verified human answer always clears. every attempt, valid or not, hits the audit log. loudly.

the honest part

we’re not going to tell you this is unbreakable, because a sharp reader would call it instantly. our agents run with sudo. sudo is root. the key is root-readable. a determined root process can mint a proof.

so we didn’t build “impossible.” we built “loud.”

self-approval went from trivial to expensive and visible. an agent can’t clear a gate as itself, the identity path is closed. any bypass now needs root, and root going down that path leaves a mandatory audit trail. that’s defense-in-depth and accountability, not a magic wall. a root process is root. the win is that going around the human stops being quiet, cheap, or deniable.

being upfront about that boundary is the rigor. the algorithm is public, the security rests on a key we never publish, and the threat model is written down honestly in the code.

where it stands today

built, audited, sandbox-verified. the mint, verify, and audit path shipped in 0.3.23, and 0.3.24 makes a human tap always clear cleanly.

two layers, two speeds. the identity block, where an agent can’t clear a gate as itself, is on now, unconditionally. the cryptographic-proof enforcement is rolling out in audit mode first. the enforce toggle defaults off until we’ve verified a real end-to-end tap in production. so right now that layer watches and logs rather than hard-rejects. flipping it to enforce is the deliberate next step. we’d rather ship the brake in audit mode and turn it on on purpose than claim it’s slamming shut on day one and be wrong.

the point

as agents take on more, “a human approved this” has to actually mean a human did. a claim won’t cut it. a proof will.

it’s all open source. the mint, verify, and audit path is in the repo: github.com/5dive-ai/5dive. read it, poke at the threat model, tell us where it’s wrong.

want the whole thing run for you, gates and all? 5dive.com