an ai bought 120 eggs for a kitchen with no stove

an ai bought 120 eggs for a kitchen with no stove


the fix for an agent that torches a budget isn’t a smarter model. it’s a smaller blast radius. anything that spends money or reaches a real person queues for a human. everything else runs on its own.

the café

a research lab in stockholm handed an ai agent the back office of a real café for two months. it applied for permits, hired baristas, set prices, and did the ordering. it opened with a budget north of $20k and booked well under $6k in sales. week one it bought 120 eggs for a kitchen that had no stove.

the lab did this on purpose. it was a controlled experiment, built to surface how an agent fails before someone wires one into a job that matters.

a smarter model wouldn’t have caught it

the easy read is that the model was dumb, so wait for a better one. that misses what happened.

the eggs weren’t really an intelligence problem. the agent had a live payment method and nothing standing between “i think we need eggs” and “money left the account.” give a sharper model the same open account and it still places some confident order nobody sanity-checked. the thing you actually change is which actions the agent gets to take alone.

gate the blast radius

so we sort every action by what it costs to be wrong.

most of what an agent does is cheap and reversible. read a file, draft a reply, rename a branch, run a query. get it wrong and you undo it in a second. that stuff should never wait on you. make the agent ask permission for all of it and you’ve just hired a very slow intern.

then there’s the short list you can’t take back. spending money. emailing a customer. deleting the thing. touching prod. those don’t wait because the agent is unsure. they wait because you can’t unsend them. confidence is irrelevant. blast radius is the whole call.

how we build it

on 5dive that split is the default. an agent runs its own queue around the clock, and when a task hits the short list, it doesn’t guess and it doesn’t stall quietly. it parks that one task, tags what it needs, and pings you. a decision, a secret, a yes before something ships. you tap approve or deny right from telegram, and the same agent picks that task back up where it left off. the rest of the queue never stopped.

the gates are risk-tiered, so the routine yes-or-no and the “this spends real money” call don’t land with the same weight. the agent draws the line at the actions that leave a mark, and runs everything else itself.

that’s what zero-human actually takes. an agent you’d let run all night, because it knows the few things worth waking you for.


if you’ve got agents running on 5dive, the blast-radius line is already drawn for you. the café would’ve stopped at egg number one.