This is the biggest elephant in the room I have seen in my decade+ career. At the same time, look how bad Apple is in software compared to its hardware... It's not an AI only problem, it's almost like software in general gets a free pass on being very unsafe or low quality because no one wants to face the same "profit reducing red tape" that civil engineers or similar face.
Anthropic were the progenitors of the Model Context Protocol. Claude Code does not fully implement the client end of the protocol. A protocol; a literal pre-defined spec that an agent should be able to one-shot. Neither does Codex. Codex does not implement MCP Prompts.
(I want Codex to implement MCP Prompts because then we have one central way to ship skills from a server).
The fact that neither platform can implement a protocol given what is functionally infinite frontier model tokens really says a lot. I do not care what kind of random project some influencer can ship with a swarm of 1000 agents. If you cannot make the basics work, it is a farce.
It still boggles my mind that Anthropic would invent the MCP protocol but not fully implement it.
Especially when fully implementing it (prompts, resources, tools) is easily done in harnesses that don’t ship with MCP but allow good extension / modification like Pi.
Claude not being able to see its own usage or self invoke slash commands is also very frustrating.
Given functionally unlimited access to tokens with frontier models, there is really no "force you to keep busy"; it should just bake overnight. We're talking about a rather simple and well-defined specification; not something novel and complex.
My point is that there is a chance that Anthropic launched MCP:
1. not fully believing in it
2. but knowing the hype around MCP would force other AI labs to implement it (standard enterprise checkbox behavior)
3. thus wasting some of their competitors' development cycles
* * *
And let's be real here, the entire discussion happens in the context of a basic bug in a coding agent, do we really believe that these labs have hit AGI in coding?
Random example:
Go to claude.ai or gemini.google.com (I imagine OpenAI is in a similar situation).
Type a question, press enter. Wait 2 seconds, then turn on airplane mode.
Not only does the connection cut off, but even if you reconnect 20 minutes later, you still won't get the answer.
Their website works in purely sync mode!!!
We knew better than this when HTML was invented, 30 years ago.
Do their products give you the impression than "baking features with unlimited tokens overnight" leads to decent products?
> same "profit reducing red tape" that civil engineers or similar face.
I don't think we should ever head toward licensing/a credential body for software development, but I do think now is a good time to have discussions around liability for defective products.
A good start would be to stop allowing companies to disclaim all warranties of fitness for a particular purpose in their EULAs. The joke of Microsoft Copilot applies here where they have a big disclaimer that "Copilot is for entertainment purposes only" while advertising says otherwise. Not even the chrome EULA will agree that its fit for purpose as a web browser. The clause is a get out of jail free card that shifts all liability and risk to the end user.
> I don't think we should ever head toward licensing/a credential body for software development, but I do think now is a good time to have discussions around liability for defective products.
Liability is how a credential body would organically grow. It already exists in the security, compliance, and enterprise parts of the software world.
That can be okay. The problems we're worried about come when it's government mandated.
The EU Cyber Resilience Act puts heavy liability on vendors for software vulnerabilities that get exploited, including in open-source components they incorporate. OSS devs are shielded - liability is on the companies who incorporate OSS into commercial stuff.
In practice, what’s the difference between a government mandated license and a government that quickly rules in favor of parties who are damaged by companies that don’t use licensed software engineers?
E.g. “Your software caused serious damages to our company / livelihood, and you best hope that it turns up in discovery that you used properly licensed software engineers who were following licensing best practices, otherwise this will be a slam dunk case.”
Genuinely an interesting question to me. Seems like the latter is a better option, generally, but it does lock restorative justice behind a paywall - you have to be able to afford a lawyer.
> shouldn't this "agentic AI revolution" have long solved this already?
Daily reminder that Anthropic took over a year to fix the Claude Code terminal flickering issue despite proclaiming all over the internet that software development as a "solved problem."
Apple forked over $250 Million in a class action over false advertising for Apple Intelligence. When do we start seeing the same for the misleading and outright false claims coming out of the frontier labs about the model capabilities? At this point the marketing is doing more harm than the technology itself because its warping the perceptions of those at the top that make decisions. The only reason tokenmaxxing was ever a thing was because marketing mislead execs and technology decisions were made based on vibes instead of evidence.
As long as a majority of the people of the living class are gullible and naive and sick, entrained behavior from the institutions and media they are made to consume, they stop seeing the misleading and false claims. Or at least they myopically see it short enough to complain about it in an ineffective way, then continue to consume the next big lie or slop. Until something happens that channels that accumulated rage finally into a cause they feel makes things right (assuming they have not already died and the next generation has been groomed to fall for the rich man's trap) and those who's family and next generation is to continue the extraction and trickery hides behind an anonymous personality or system.
You can use it to accelerate development certainly, but that requires careful change->review cycles. The developer still needs to be in heavy control, versus vibe coding having an agent own the code base.
Like anything, you have to decide between polish vs switch to any other task in the queue. If you choose too much from the latter, then polish suffers, yet that's a human thing.
Also, Codex and Claude Code aren't as bad as people say. I think most of the noise is embellished by the "hah see? AI sucks" angle.
It's kind of like how HNers would claim to your face that you can't actually build anything with Javascript and Node.js (JS just sucks too much), then they'd list off a few footguns that were supposed to demonstrate why. In other words, champing at the bit for JS to lead people to catastrophize issues that were pretty mediocre.
Here we are talking about trillon dollar AI companies who claim AI can fix decade old bugs and create new compilers, OSs and what not. Are parallel agents working autonomously to fix issues as well as create new features not allowed at these companies?
> Why do you "have to decide"? Let some agents go at both of those, isn't that what they claim people can just do?
Because your code is still marching somewhere in tokens per second. You have to decide where they are allocated: polish or the next thing. Humans still are the ones prompting LLMs and deciding what is done.
> isn't that what they claim? Why shouldn't it? They're not the ones making the extraordinary claims.
Even if I grant that someone else makes excessive claims, why would that let you off the hook to stay grounded?
Though I don't grant it. Maybe if Anthropic claimed that Opus makes all decisions at the company and builds all software without humans doing all the prompting, the critics would make more sense.
Until then, it looks more like a double standard: if software built with AI has any issues, then see, AI is shit and the humans who invoked it had no role in it. e.g. it could be the case that Anthropic's Claude Code engineers just aren't doing as much polish as they should.
Better answer: Someone asked why might it be the case that AI-written software has issues, and it has a real answer. Marketing claims are a different conversation.
> Maybe if Anthropic claimed that you could write an unsupervised loop that writes perfect software, the critics would make more sense.
Or to be upstanding, ethical companies that they are. Just put disclaimer after every prompt response and on their website "AI generated code has no absolutely no guarantee of quality or correctness. Human prompter must be held accountable for any mistake or inaccuracies."
Hope it wouldn't be too much bother to these important companies.
See, but that would counter act all of their marketing and hurt the feelings of all the execs that desperately want to believe that software development is "solved" and in the near future they won't have to hire those expensive, pesky developers ever again.
I don't see how these things conflict. Nor did I get the point you were making in the sarcastic upstream comment.
It is obviously the case that you can both delegate code implementation to AI and also be responsible for it. You are signing off on the code you submit to a project no matter where you got it from nor how it was generated nor who you delegated the task to ("actually my friend wrote it so if it sucks don't look at me").
AI didn't change this, nor will it until there are no more humans in the loop.
They don't conflict, if the generated code is acceptable. Maybe I'm holding it wrong, or I'm not using the right combination of plugins and MCPs. But if I'm not allowed to manually correct the generated output, then I am forced into a loop of generating corrections until it's good enough to stake my job on. I hope you can see that such a policy would be ridiculous.
> Because your code is still marching somewhere in tokens per second. You have to decide where they are allocated: polish or the next thing. Humans still are the ones prompting LLMs and deciding what is done.
It sounds like you're saying that, even with the most tokens of anyone in the world at your disposal, you can't really finish what is effectively a glue layer between a server, a set of local files and a user?
Doesn't sound to me that the agents are all that effective TBH.
I sometimes use LLMs as search engine replacements for finding libraries for a specific niches. Gemini is the only one that seems to routinely hallucinate Rust crates from thin air, where otherwise Google is the one most willing to let their LLM peek at a search index to have up-to-date ground facts. It's puzzling. Definitely feels like it's not "organizing the world's information" there.
If the code churn is high the investment to refactoring etc is less beneficial than may be obvious. I don't remember the details but I heard in some podcast that the code base of Claude Code changes so fast that any piece of code won't be there for long..
The "AI revolution" feels like it's creating a bunch of ultra-smart AI models are scarily good at cracking most of human-created security (Mythos), but also happen to be careless snobs that just leave litter and mess in their wake.
We don't really know how much human intervention there is in mythos… maybe it has a very high rate of false positives that get checked by hand before publishing them.
Claude Code has been out for just 1 year and has millions of users already, being a major contribution to roughly $40 billion in revenue. By any stretch it is one of the most extremely fast developed products driving the most important workflow for millions of people already.
"Why isn't literally everything about a product that came out a year ago with an extremely fast scaling userbase solved?" is what I hear.
The goalposts will keep moving until AGI is undeniable.
Yes all the F500 companies have been paying eye watering $$$ for Claude Code for half the year because “the king was naked” Those ruthless cost savings corporations, they surely never care about trimming extra spend
A simple explanation is that they are "good enough" for most people and they have better things to do. Even if tomorrow I was 100 times as productive, I still wouldn't have time to do literally everything and I would have to prioritize.
I think they have more than one job, they have to balance new features with improving the software itself. And Anthropic has to balance investing resources into Claude Code vs on infra or other things.
Not that I'm happy with the current state of things, in fact I'm quite sad that improvements in capacity to do things doesn't translate into better quality.
> they have to balance new features with improving the software itself.
What new features?
> And Anthropic has to balance investing resources into Claude Code vs on infra or other things.
It seems they are doing neither? Their vibe-coders boast everywhere that they no longer even work, but just endlessly prompt Claude Code in a loop. Perhaps that's why there's no polish? Perhaps that's why their spring post about Claude Code issues reads like "these are all issues that would take a junior programmer a day to test and fix before they ever reached production"? https://www.anthropic.com/engineering/april-23-postmortem
e.g. Irish Ltd that is a resident in Germany
you won't have to bother with the naming problems etc. either
reply