Skip to main content
Aman Kumar Singh
Software Engineer Intern
View all authors

If the QA Gate Works, the Code Is Just an Artifact

· 15 min read
Aman Kumar Singh
Software Engineer Intern
TL;DR:

We're building an agent harness that ships software without a human in the loop. Coding agents are already good at writing code; the gap is the job a QA engineer used to do — confirming the feature matches the spec, satisfies every functional and non-functional requirement, and doesn't break anything else for the user. Without a reliable answer to that question, a human has to step in by hand and the autonomy claim collapses. So the QA gate is the piece the rest of the harness hangs from. This post is about functional-verify, our attempt at building it.

The harness loop: TDD feeds Code Review feeds the Quality Gate. The gate has two stacked layers, the metric layer (typecheck, lint, tests, coverage, scope, ignore-comment audit) and the functional-verify layer (API curl, DB queries, UI via Playwright MCP). PASS proceeds to PR. BLOCKED loops back to TDD.

QA is the load-bearing piece

For a while we thought the hard part of building an agent harness would be writing the code. It isn't. Current coding agents write code about as well as a junior engineer with infinite patience and zero ego. They pick reasonable abstractions, they write tests alongside, they pass review more often than not.

The hard part is the job a QA engineer used to do. Confirm the feature matches the spec. Confirm every functional requirement holds. Confirm the non-functional ones — usability, performance, accessibility, security — are met. Confirm nothing else broke for the user. That's the part current agents are bad at, and it's the part that decides whether a harness can ship product without a person standing behind it.

Everything else in the harness — the planning agent, the dev agent, the review pass — produces an intermediate artifact. The QA gate is the only thing that grades the deliverable. Get it right and the code is disposable; no human needs to read it. Get it wrong and the autonomy claim falls over, because the only way left to verify the deliverable is to do it by hand. functional-verify is our attempt at that gate. It runs as a separate sub-agent, boots the application, hits the live API with curl, queries the live database, drives the UI in a real browser, and writes a proof report a reviewer can replay. We've been running it on every PR for about a month.