Karpathy named agentic engineering. we built the company that runs it.
karpathy gave the thing a name. at sequoia ascent he split it clean:
“Vibe coding raises the floor. It lets almost anyone create software by describing what they want… Agentic engineering raises the ceiling. It is the professional discipline of coordinating fallible agents while preserving correctness, security, taste, and maintainability.”
the floor is for everyone. the ceiling is a job. and the job isn’t writing code anymore.
scaffolding was never the hard part. the eval loop is. and the writer can’t grade its own work.
what he actually named
agentic engineering isn’t vibe coding with a bigger model. it’s a discipline. karpathy lists what the work actually is:
“The agentic engineer does not blindly accept generated code. They design specs, supervise plans, inspect diffs, write tests, create evaluation loops, manage permissions, isolate worktrees, and preserve quality.”
read that list again. none of it is typing code. the human moved up a layer: you specify, you supervise, you verify. the agent generates. and what gets scarce is the part you can’t hand off:
“You can outsource your thinking, but you can’t outsource your understanding.”
code generation is the cheap part now. taste, eval design, system boundaries, orchestration. that’s the scarce part. that’s the job.
the part everyone’s about to get wrong
the term is hot, so everyone’s reaching for a tool. google just shipped agents-cli: scaffold, eval, deploy, with llm-as-judge grading baked into the toolchain, for building agents on google cloud. it’s a good tool. it’s also a skills layer for one developer, one cloud, one repo at a time.
a CLI hands you the scaffold. it does not run the discipline for you. you still have to sit there and be the human in the loop, every loop, forever. that’s the trap in treating agentic engineering as a thing you install. the naming is a job description, not a download.
the eval loop is the whole thing
here’s the line that matters, and it’s the one a scaffold can’t fix: ask an agent to check its own work and it says yes. always. it has no incentive to fail itself. you get the output, plus a confident “looks good,” and no idea whether either is true.
so the eval has to come from somewhere the writer can’t reach. one agent writes. a different agent, different context, grades it, and its only job is to find what’s wrong and reject it. the writer never marks its own homework. that’s karpathy’s evaluation loop, made structural instead of optional.
we run it live. watch the maker→verifier loop run: one agent ships, a second one with a separate job tries to break it, a human signs off at the gate. the quality floor jumps, not because the model got smarter, but because nothing trusts a single pass to catch its own mistakes.
we didn’t tool the discipline. we built the company that runs it.
karpathy frames agentic engineering as a single disciplined developer. that’s the floor of the ceiling. we skipped to the org.
always-on agents with persistent identity and memory, handing specs to each other, grading each other’s work, escalating to a human only at the gate. spec, build, independent eval, human sign-off, running continuously on a box you own. not a scaffold you run once and babysit. a standing team that runs the loop while you sleep and pings you when it needs a decision.
the discipline got a name this month. we’ve been running it as a company.
try it
5dive is open source. the org layer, the maker→verifier loop, the fleet control plane, all of it: github.com/5dive-ai/5dive.
want it managed, the team already standing? 5dive.com.
karpathy named the ceiling. the question is whether you’re reaching for a tool, or building the thing that runs it.