> The argument relies crucially on ideas that may, at least in retrospect, be attributed to Ellenberg-Venkatesh, Golod-Shafarevich, and Hajir-Maire-Ramakrishna.
The last two are straightforward. The proof relies on a result called the Golod-Shafarevich theorem that gives a criterion for a group to be infinite. Golod and Shafarevich proved this a long time ago (1964). Moreover, if you look at how Golod and Shafarevich used this criterion, it's the same way it's used in the proof: They apply it to some Galois groups that appear in number theory, prove these are infinite in certain cases, and deduce that there exists an infinite tower of number fields with some surprising properties.
Much more recently (2021), Hajir, Maire, and Ramakrishna figured out how to apply the Golod-Shafarevich theorem to a slightly different Galois group to produce an infinite tower of number fields with some even more surprising properties. This is used in the new proof. It requires very slightly modifying the construction of Hajir, Maire, and Ramakrishna to produce the fields needed in this proof, but the explanation of how to do this takes only a paragraph in the human-written summary. (The explanation is more laborious in the original AI writeup).
The relation to Ellenberg-Venkatesh is more indirect. This is where "in retrospect" comes in because this work was not cited in the original AI proof. This has to do with the next step of the proof, after you construct the number field, you need to find many elements of this field with the same norm to produce many vectors of the same length. To do this, the proof uses a pigeonhole argument which uses small split primes of the field (constructed via Hajir, Maire, and Ramakrishna's argument) to construct many ideals. By the pigeonhole principle, you can guarantee two ideals lie in the same class. When two ideals lie in the same class, you get an element of the field. You can rig things so these elements all have the same norm. Ellenberg and Venkatesh had an argument which also used the pigeonhole prnciple to guarantee two ideals lie in the same class to produce elements of the field. They were working on a different problem so their argument was slightly different, but similar.
It would already help a lot when the C and C++ standards start to clean up the list of Undefined Behaviour (e.g. there's a lot of nonsense UB currently in the C standard which could easily become Defined Behaviour - like the "file doesn't end in a new-line character" thing):
But don't misunderstand the goal of that: C and C++ will never get rid of UB. The result of dereferencing an invalid pointer is UB, will always remain UB, and really cannot be anything other than UB.
The easy cases like you cite are also those that don’t cause problems in practice. I’m not sure that would help all that much, other than to slightly reduce internet criticism.
The issue is that the list is infinite (anything not specified is UB), so actually removing any finite amount of UB from the list won't make it shorter.
(only slightly tongue-in-cheek, I do believe that removing silly things is worthwhile).
To be undefined behaviour, it must at least be valid syntax. The syntax is described in a finite document. Also it only gets executed by a finite machine, that has a finite number of finite descriptive documents.
> The article suggests using LLMs to identify and fix UB. However as per the above, I think the issue is that we need more expert humans.
Yup. But the point of the article is that even expert humans cannot do this alone. And as I wrote, LLM+junior won't suffice either. We need LLM+senior experts.
And it's a problem that we have way more existing UB than expert capacity.
Now, will LLMs and experts both miss UB in some cases? Of course. There's no 100% solution. But LLMs, I claim, will find orders of magnitude more, with low false positive, than any expert. Even if these expert humans (like in the OpenBSD case for the two bugs I found, one of which was UB) are given more than three decades to do it.
I didn't even use the best model, complex code target, or time. I just wanted to choose a target that has a high chance of having very good experts already having audited it.
Our LLM powered coding assistance are pretty good at doing lots of busywork that doesn't require all that much smarts. So they can supervise running our UB checks, like Valgrind, and making the linters happy.
> So if you standard says 'you have to crash with an error message' that's already no longer UB.
Sure. For crashes. But when you instruct an LLM to do something, the output is probablistic, so you may get behviour that is unexpected and/or unwanted.
Like storing security tokens in code. Or nuking the production database.
It's written in the future tense, so I can safely call it speculation. I've read the abstract which is all I need to decide the full text is not worth my time.
Cool, then we can safely give your comments exactly the same treatment - since they are completely uninformed speculation about a paper you haven't read.
Is he incorrect that the paper is speculating about future events? I don't think it's completely uninformed either. He said that he's read the abstract, which is supposed to give you an impression of the structure of the argument. Why don't you engage with the criticism?
I read the entire paper, and his criticism is spot on. I even read through many of the references, which, in my spot checks, don't support the claims in the paper. Very disappointing work, IMHO.
I did both! I'm not concerned with defending anyone, I'm interested in truth. His criticism was sound, and your comments contribute even less to the discussion than his. Very disappointing.
> Is he incorrect that the paper is speculating about future events? I don't think it's completely uninformed either.
Most people would say this is a defense of the person, or at least a defense of the person's choice to not read the full paper. It is no fun to debate with intellectual dishonesty.
Anyone with experience reading research papers professionally will tell you that one of the responsibilities of a paper's abstract is to meaningfully convey the level of evidence and certainty that the paper is backed by. This paper did very well at that, by having the abstract indicate its more of an essay/opinion piece than an a more scientific piece. This is blindingly obvious, and was a simple observation that everyone for some reason dismissed not on merit, but because the person who said it hadn't read the whole paper, which for a 40 page document is an incredibly high bar that is likely not met by 90% of the people commenting here.
And you must have read all 40 pages of it, right? Because if not you are a hypocrite. I claim that the Bible is the literal truth. Oh, you haven't read every word of the Bible? Your arguments against me are worthless!
I did actually read all 40 pages of it. I frequently read law journal articles, among with lots of other types of journals and papers.
I also used to maintain up to date reading lists of various areas (compiler optimization, for example) because I would read so many of the papers.
Let me give you a piece of advice:
First, gather facts, then respond.
Here you start by sarcastically asserting i wouldn't have read it, but it would generally be better to ask if i read it (fact gathering), and then devise a response based on my answer. Because your assertion is simply wrong, making the rest of it even sillier.
As for the strawman about the bible - i'm kinda surprised you are really trying to equate not reading any part of something with not reading every part of something, and really trying to defend what you did here, instead of just owning up to it and moving on.
This speaks a lot more about you than anything else.
That said -
When you make a claim covering that everything in a book is the literal truth, you only have to find a part that is not the literal truth to prove the claim wrong. Which may or may not require reading the entire thing to start (if it turns out your counter-claim is wrong, you at least have to read and find another)
In the original comment, you'll note your claim was "This is nothing but speculation" - IE all of the paper is speculation.
If we are being accurate, this would require you reading the entire thing to be able to say all of it is speculation. How could you know otherwise?
Even if we were being nice, and treat your claim colloquially as meaning "most of it is speculation", this would still require reading some of the paper, which you didn't do either.
Perhaps you should just quit while you are behind, and learn that when you screw up, the correct thing to do is say "yeah, i screwed up, i should have read it before saying that", instead of trying to double down on it.
Doubling down like this just makes you look worse.
As an aside - I was always an avid reader, and very bored in synagogue, so i have read every word of a number of books of the hebrew bible because it was more interesting than paying attention to the sermons.
His criticism that the paper is speculation is spot on. Many of the references don't support the claims they are cited for. It's fascinating to me that you want to argue the poster's standing to make a criticism more than you want to actually discuss the content of the paper.
Its a particularly weird criticism given that Danny is a lawyer and has experience in the CS research community. He is especially well suited to address a criticism that the authors are trying to trick people into thinking their work is a scientific paper, which is plainly a ridiculous criticism.
Boston Univ. School of Law Research Paper No. 5870623
40 Pages Posted: 8 Dec 2025 Last revised: 13 Jan 2026
```
What exactly is this document? It reads like a heavily cited op-ed, but is coming out of a law school from a professor there and calls itself a "research paper". Very strange.
EDIT: I looked up UC Journal of Law, and I think I was misled because I'm not familiar with the domain. They describe themselves as:
> Since 1949, UC Law Journal, formerly known as Hastings Law Journal, has published scholarly articles, essays, and student Notes on a broad range of legal topics. With roughly 100 members, UCLJ publishes six issues each year, reaching a large domestic and international audience. Each year, one issue is dedicated to essays and commentary from our annual symposium, which features speakers and panel discussions on an area of current interest and development in the law.
So this is congruent with the Journal's normal content (it's an essay), but having the document call itself a "research paper" conjured an inflated expectation about the rigor involved in the analysis, at least for me.
> So this is congruent with the Journal's normal content (it's an essay), but having the document call itself a "research paper" conjured an inflated expectation about the rigor involved in the analysis, at least for me.
Right. And I think it is weird that people immediately leapt to this being some sort of deception by the authors and I think it was weird that when a lawyer who has experience in both domains clarified this that people doubled down.
Yep, I agree that jumping to the "deception" angle would be pretty far down on my list. I always admired the simplicity of HN's guideline to focus on curiosity, since it has far-reaching effects on the nature of the discourse.
> Even if we were being nice, and treat your claim colloquially as meaning "most of it is speculation", this would still require reading some of the paper, which you didn't do either.
I did read a some of it. The abstract. Which is there for the specific purpose of providing readers a summary to decide whether it is worth their time to read the whole thing.
And, yeah, obviously I didn't mean literally all because that just isn't how people talk. e.g. the author's names are not speculation. But the central premise of the paper "How AI Destroys Institutions" is speculative unless they provide a list of institutions that have been destroyed by AI and prove that they have. The institutions they list, "the rule of law, universities, and a free press," have not been destroyed by AI, so therefore, the central claim of the paper is speculative. And speculation on how new tech breakthroughs will play out is generally useless, the classic example being "I think there is a world market for maybe five computers," by the CEO of IBM.
Furthermore their claim here:
> The real superpower of institutions is their ability to evolve and adapt within a hierarchy of authority and a framework for roles and rules while maintaining legitimacy in the knowledge produced and the actions taken. Purpose-driven institutions built around transparency, cooperation, and accountability empower individuals to take intellectual risks and challenge the status quo.
This just completely contradicts any experience I have ever had with such institutions. Especially "empower individuals to take intellectual risks and challenge the status quo". Yeah. If you believe that, then I've got a bridge to sell you. These guys are some serious koolaid drinkers. Large institutions are where creativity and risk taking go to die. So yeah, not reading 40 pages by these guys.
You can tell a lot from a summary, and the entire premise that you have to read a huge paper to criticize is just bullshit in general.
We should be absolutely terrified about the amount of access these things have to users systems. Of course there is advice to use a sandbox but there are stupid people out there (I'm one of them) who disregard this advice because it's too cumbersome, so Claude is being run in yolo mode, on the same machine that has access access to bank accounts, insurance, password manager and crypto private keys.
> The argument relies crucially on ideas that may, at least in retrospect, be attributed to Ellenberg-Venkatesh, Golod-Shafarevich, and Hajir-Maire-Ramakrishna.
Can someone please elaborate on this?