Interesting that the entire discussion here is about orchestration, vendor lock-...

Interesting that the entire discussion here is about orchestration, vendor lock-in, and model selection — but nobody is asking about the output. These agents run for hours, write code across multiple repos, and open PRs autonomously. Anthropic built solid infrastructure governance: sandboxing, scoped permissions, execution tracing. That's the "can the agent access this system safely?" question, answered. But there's a different question that nobody seems to be answering: is the generated code actually correct? Not syntactically — it almost always is. I mean semantically. Does it reference database fields that actually exist in the schema? Does it call API routes that are actually defined? Does it handle env variables that are actually set? Does it meet compliance requirements that apply to the system it's modifying? 45% of AI-generated code contains security vulnerabilities (Veracode). Code duplication has quadrupled (GitClear). And we're now scaling this with autonomous agents that run unsupervised for hours. The orchestration problem is getting solved. The governance-of-output problem is wide open. That's the layer that's missing.