More

__0x01 · 2026-05-20T23:33:08 1779319988

From the companion paper:

> The argument relies crucially on ideas that may, at least in retrospect, be attributed to Ellenberg-Venkatesh, Golod-Shafarevich, and Hajir-Maire-Ramakrishna.

Can someone please elaborate on this?

awdfeswavcra · 2026-05-21T01:17:40 1779326260

The last two are straightforward. The proof relies on a result called the Golod-Shafarevich theorem that gives a criterion for a group to be infinite. Golod and Shafarevich proved this a long time ago (1964). Moreover, if you look at how Golod and Shafarevich used this criterion, it's the same way it's used in the proof: They apply it to some Galois groups that appear in number theory, prove these are infinite in certain cases, and deduce that there exists an infinite tower of number fields with some surprising properties.

Much more recently (2021), Hajir, Maire, and Ramakrishna figured out how to apply the Golod-Shafarevich theorem to a slightly different Galois group to produce an infinite tower of number fields with some even more surprising properties. This is used in the new proof. It requires very slightly modifying the construction of Hajir, Maire, and Ramakrishna to produce the fields needed in this proof, but the explanation of how to do this takes only a paragraph in the human-written summary. (The explanation is more laborious in the original AI writeup).

The relation to Ellenberg-Venkatesh is more indirect. This is where "in retrospect" comes in because this work was not cited in the original AI proof. This has to do with the next step of the proof, after you construct the number field, you need to find many elements of this field with the same norm to produce many vectors of the same length. To do this, the proof uses a pigeonhole argument which uses small split primes of the field (constructed via Hajir, Maire, and Ramakrishna's argument) to construct many ideals. By the pigeonhole principle, you can guarantee two ideals lie in the same class. When two ideals lie in the same class, you get an element of the field. You can rig things so these elements all have the same norm. Ellenberg and Venkatesh had an argument which also used the pigeonhole prnciple to guarantee two ideals lie in the same class to produce elements of the field. They were working on a different problem so their argument was slightly different, but similar.

__0x01 · 2026-05-20T06:51:42 1779259902

> A problem with this is that in order to confirm the findings, you’ll need an expert human. But generally expert humans are busy doing other things.

The article suggests using LLMs to identify and fix UB. However as per the above, I think the issue is that we need more expert humans.

LLM generated code will eventually contain UB.

EDIT: added "eventually"

flohofwoe · 2026-05-20T07:37:16 1779262636

It would already help a lot when the C and C++ standards start to clean up the list of Undefined Behaviour (e.g. there's a lot of nonsense UB currently in the C standard which could easily become Defined Behaviour - like the "file doesn't end in a new-line character" thing):

https://gist.github.com/Earnestly/7c903f481ff9d29a3dd1

jcranmer · 2026-05-20T14:04:38 1779285878

The C committee is cleaning up a lot of UB (check https://www.open-std.org/jtc1/sc22/wg14/www/wg14_document_lo... for paper titles like "slaying earthly demons").

But don't misunderstand the goal of that: C and C++ will never get rid of UB. The result of dereferencing an invalid pointer is UB, will always remain UB, and really cannot be anything other than UB.

layer8 · 2026-05-20T07:59:34 1779263974

The easy cases like you cite are also those that don’t cause problems in practice. I’m not sure that would help all that much, other than to slightly reduce internet criticism.

talkin · 2026-05-20T08:27:38 1779265658

Fixing easy cases makes the list shorter, so enables more focus on harder cases.

And it also signals that you actually do want to improve, just a little bit of boy scout rule goes a long way.

gpderetta · 2026-05-20T09:31:11 1779269471

The issue is that the list is infinite (anything not specified is UB), so actually removing any finite amount of UB from the list won't make it shorter.

(only slightly tongue-in-cheek, I do believe that removing silly things is worthwhile).

1718627440 · 2026-05-20T10:18:53 1779272333

The list of UB categories and rules is not infinite. The list of UB programs is, as is the list of all non UB programs.

gpderetta · 2026-05-20T11:36:04 1779276964

It is not obvious to me that the list of categories is not infinite (unless the final category is "everything else" of course)

1718627440 · 2026-05-20T12:55:02 1779281702

To be undefined behaviour, it must at least be valid syntax. The syntax is described in a finite document. Also it only gets executed by a finite machine, that has a finite number of finite descriptive documents.

flohofwoe · 2026-05-20T14:17:23 1779286643

The list of unspecified behaviour is infinite, but the list of undefined behaviour is well defined and finite ;)

thomashabets2 · 2026-05-20T09:25:32 1779269132

Author here.

> The article suggests using LLMs to identify and fix UB. However as per the above, I think the issue is that we need more expert humans.

Yup. But the point of the article is that even expert humans cannot do this alone. And as I wrote, LLM+junior won't suffice either. We need LLM+senior experts.

And it's a problem that we have way more existing UB than expert capacity.

Now, will LLMs and experts both miss UB in some cases? Of course. There's no 100% solution. But LLMs, I claim, will find orders of magnitude more, with low false positive, than any expert. Even if these expert humans (like in the OpenBSD case for the two bugs I found, one of which was UB) are given more than three decades to do it.

I didn't even use the best model, complex code target, or time. I just wanted to choose a target that has a high chance of having very good experts already having audited it.

eru · 2026-05-20T07:41:24 1779262884

Our LLM powered coding assistance are pretty good at doing lots of busywork that doesn't require all that much smarts. So they can supervise running our UB checks, like Valgrind, and making the linters happy.

lelanthran · 2026-05-20T07:24:05 1779261845

> LLM generated code will eventually contain UB.

Yes.

Even in languages other than C (i.e. you will get behaviour that nothing in the input specified).

When LLMs generate code, all languages have UB.

eru · 2026-05-20T07:42:37 1779262957

That's a bit silly.

UB means literally no restrictions. So if you standard says 'you have to crash with an error message' that's already no longer UB.

lelanthran · 2026-05-20T08:06:12 1779264372

> So if you standard says 'you have to crash with an error message' that's already no longer UB.

Sure. For crashes. But when you instruct an LLM to do something, the output is probablistic, so you may get behviour that is unexpected and/or unwanted.

Like storing security tokens in code. Or nuking the production database.

eru · 2026-05-21T03:56:34 1779335794

If you fix the random seed you use for sampling, your LLM is perfectly deterministic.

And there's no requirement for C compilers' UB to be deterministic either.

__0x01 · 2026-03-27T04:41:51 1774586511

> Correctness: 1,778 test cases from the official jsonata-js test suite + 2,107 integration tests in the production wrapper.

The AI generated code can still introduce subtle bugs that lead to incorrect behaviour.

One example of this is the introduction of functions into the codebase (by AI) that have bugs but no corresponding tests.

EDIT: correct quotation characters

sethammons · 2026-03-27T07:41:29 1774597289

AI will happily update tests to be wrong or miss the intention of the code and test the wrong things.

__0x01 · 2026-03-10T07:39:06 1773128346

Can a datacenter practicably upgrade to the next generation of GPU every year?

lflckfnbs · 2026-03-10T09:06:45 1773133605

If it wants to stay competitive, yes.

__0x01 · 2026-03-06T12:13:50 1772799230

What was the interesting problem that Swiss watchmakers could have focused on, instead of pursuing brand?

__0x01 · 2026-01-27T10:22:04 1769509324

Is "AI code review" a correct term?

A code review requires reasoning and understanding, things that to my knowledge a generative model cannot do.

Surely the most an AI code review ever could be is something that looks like a code review.

rat9988 · 2026-01-27T10:28:25 1769509705

Given we are more interested in the end than in the mean, it is a good usage.

__0x01 · 2026-01-27T12:38:42 1769517522

Please can you elaborate? We are more intersted in "the end" in what sense?

__0x01 · 2026-01-16T11:22:24 1768562544

> This is nothing but speculation

Did you read the paper?

terminalshort · 2026-01-16T11:29:18 1768562958

It's written in the future tense, so I can safely call it speculation. I've read the abstract which is all I need to decide the full text is not worth my time.

DannyBee · 2026-01-16T11:32:59 1768563179

Cool, then we can safely give your comments exactly the same treatment - since they are completely uninformed speculation about a paper you haven't read.

rpdillon · 2026-01-16T12:55:04 1768568104

Is he incorrect that the paper is speculating about future events? I don't think it's completely uninformed either. He said that he's read the abstract, which is supposed to give you an impression of the structure of the argument. Why don't you engage with the criticism?

somebehemoth · 2026-01-16T13:41:26 1768570886

There is no criticism. He did not read the paper.

rpdillon · 2026-01-16T15:48:48 1768578528

I read the entire paper, and his criticism is spot on. I even read through many of the references, which, in my spot checks, don't support the claims in the paper. Very disappointing work, IMHO.

somebehemoth · 2026-01-17T15:09:07 1768662547

Cool. Perhaps you should have criticized the paper and requested feedback instead of defending someone who did not read the paper!

rpdillon · 2026-01-17T15:20:37 1768663237

I did both! I'm not concerned with defending anyone, I'm interested in truth. His criticism was sound, and your comments contribute even less to the discussion than his. Very disappointing.

somebehemoth · 2026-01-17T17:08:36 1768669716

> Is he incorrect that the paper is speculating about future events? I don't think it's completely uninformed either.

Most people would say this is a defense of the person, or at least a defense of the person's choice to not read the full paper. It is no fun to debate with intellectual dishonesty.

rpdillon · 2026-01-18T00:39:15 1768696755

Anyone with experience reading research papers professionally will tell you that one of the responsibilities of a paper's abstract is to meaningfully convey the level of evidence and certainty that the paper is backed by. This paper did very well at that, by having the abstract indicate its more of an essay/opinion piece than an a more scientific piece. This is blindingly obvious, and was a simple observation that everyone for some reason dismissed not on merit, but because the person who said it hadn't read the whole paper, which for a 40 page document is an incredibly high bar that is likely not met by 90% of the people commenting here.

Anyway, I'm tired of this now.

terminalshort · 2026-01-16T11:42:19 1768563739

And you must have read all 40 pages of it, right? Because if not you are a hypocrite. I claim that the Bible is the literal truth. Oh, you haven't read every word of the Bible? Your arguments against me are worthless!

DannyBee · 2026-01-16T14:55:16 1768575316

I did actually read all 40 pages of it. I frequently read law journal articles, among with lots of other types of journals and papers.

I also used to maintain up to date reading lists of various areas (compiler optimization, for example) because I would read so many of the papers.

Let me give you a piece of advice:

First, gather facts, then respond.

Here you start by sarcastically asserting i wouldn't have read it, but it would generally be better to ask if i read it (fact gathering), and then devise a response based on my answer. Because your assertion is simply wrong, making the rest of it even sillier.

As for the strawman about the bible - i'm kinda surprised you are really trying to equate not reading any part of something with not reading every part of something, and really trying to defend what you did here, instead of just owning up to it and moving on.

This speaks a lot more about you than anything else.

That said -

When you make a claim covering that everything in a book is the literal truth, you only have to find a part that is not the literal truth to prove the claim wrong. Which may or may not require reading the entire thing to start (if it turns out your counter-claim is wrong, you at least have to read and find another)

In the original comment, you'll note your claim was "This is nothing but speculation" - IE all of the paper is speculation.

If we are being accurate, this would require you reading the entire thing to be able to say all of it is speculation. How could you know otherwise?

Even if we were being nice, and treat your claim colloquially as meaning "most of it is speculation", this would still require reading some of the paper, which you didn't do either.

Perhaps you should just quit while you are behind, and learn that when you screw up, the correct thing to do is say "yeah, i screwed up, i should have read it before saying that", instead of trying to double down on it.

Doubling down like this just makes you look worse.

As an aside - I was always an avid reader, and very bored in synagogue, so i have read every word of a number of books of the hebrew bible because it was more interesting than paying attention to the sermons.

rpdillon · 2026-01-16T15:50:44 1768578644

His criticism that the paper is speculation is spot on. Many of the references don't support the claims they are cited for. It's fascinating to me that you want to argue the poster's standing to make a criticism more than you want to actually discuss the content of the paper.

UncleMeat · 2026-01-16T17:19:40 1768583980

Its a particularly weird criticism given that Danny is a lawyer and has experience in the CS research community. He is especially well suited to address a criticism that the authors are trying to trick people into thinking their work is a scientific paper, which is plainly a ridiculous criticism.

rpdillon · 2026-01-16T18:01:46 1768586506

I'd love some clarity on that.

The linked page says this:

``` How AI Destroys Institutions

77 UC Law Journal (forthcoming 2026)

Boston Univ. School of Law Research Paper No. 5870623

40 Pages Posted: 8 Dec 2025 Last revised: 13 Jan 2026 ```

What exactly is this document? It reads like a heavily cited op-ed, but is coming out of a law school from a professor there and calls itself a "research paper". Very strange.

EDIT: I looked up UC Journal of Law, and I think I was misled because I'm not familiar with the domain. They describe themselves as:

> Since 1949, UC Law Journal, formerly known as Hastings Law Journal, has published scholarly articles, essays, and student Notes on a broad range of legal topics. With roughly 100 members, UCLJ publishes six issues each year, reaching a large domestic and international audience. Each year, one issue is dedicated to essays and commentary from our annual symposium, which features speakers and panel discussions on an area of current interest and development in the law.

So this is congruent with the Journal's normal content (it's an essay), but having the document call itself a "research paper" conjured an inflated expectation about the rigor involved in the analysis, at least for me.

UncleMeat · 2026-01-16T23:13:40 1768605220

> So this is congruent with the Journal's normal content (it's an essay), but having the document call itself a "research paper" conjured an inflated expectation about the rigor involved in the analysis, at least for me.

Right. And I think it is weird that people immediately leapt to this being some sort of deception by the authors and I think it was weird that when a lawyer who has experience in both domains clarified this that people doubled down.

rpdillon · 2026-01-17T01:12:24 1768612344

Yep, I agree that jumping to the "deception" angle would be pretty far down on my list. I always admired the simplicity of HN's guideline to focus on curiosity, since it has far-reaching effects on the nature of the discourse.

terminalshort · 2026-01-16T18:43:46 1768589026

> Even if we were being nice, and treat your claim colloquially as meaning "most of it is speculation", this would still require reading some of the paper, which you didn't do either.

I did read a some of it. The abstract. Which is there for the specific purpose of providing readers a summary to decide whether it is worth their time to read the whole thing.

And, yeah, obviously I didn't mean literally all because that just isn't how people talk. e.g. the author's names are not speculation. But the central premise of the paper "How AI Destroys Institutions" is speculative unless they provide a list of institutions that have been destroyed by AI and prove that they have. The institutions they list, "the rule of law, universities, and a free press," have not been destroyed by AI, so therefore, the central claim of the paper is speculative. And speculation on how new tech breakthroughs will play out is generally useless, the classic example being "I think there is a world market for maybe five computers," by the CEO of IBM.

Furthermore their claim here: > The real superpower of institutions is their ability to evolve and adapt within a hierarchy of authority and a framework for roles and rules while maintaining legitimacy in the knowledge produced and the actions taken. Purpose-driven institutions built around transparency, cooperation, and accountability empower individuals to take intellectual risks and challenge the status quo.

This just completely contradicts any experience I have ever had with such institutions. Especially "empower individuals to take intellectual risks and challenge the status quo". Yeah. If you believe that, then I've got a bridge to sell you. These guys are some serious koolaid drinkers. Large institutions are where creativity and risk taking go to die. So yeah, not reading 40 pages by these guys.

You can tell a lot from a summary, and the entire premise that you have to read a huge paper to criticize is just bullshit in general.

__0x01 · 2026-01-15T00:33:03 1768437183

I also worry about a centralised service having access to confidential and private plaintext files of millions of users.

ordersofmag · 2026-01-15T02:11:22 1768443082

Heard of google drive?

__0x01 · 2026-01-12T02:10:06 1768183806

Often when I look closely at the output of LLM generated code, I see repetition, redundant logic and deeply hidden bugs.

Notwithstanding the above, to my understanding LLM services are currently being sold below cost.

If all of the above is true, at some point the degredation of quality in codebases that use these tools will be too expensive to ignore.

__0x01 · 2026-01-09T04:06:39 1767931599

These LLM tools appear to have an unprecedented amount of access to the file systems of their users.

Is this correct, and if so do we need to be concerned about user privacy and security?

fragmede · 2026-01-09T05:27:51 1767936471

We should be absolutely terrified about the amount of access these things have to users systems. Of course there is advice to use a sandbox but there are stupid people out there (I'm one of them) who disregard this advice because it's too cumbersome, so Claude is being run in yolo mode, on the same machine that has access access to bank accounts, insurance, password manager and crypto private keys.