Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

To be clear, we don’t know that this tool is better at finding bugs than fuzzing. We just know that it’s finding bugs that fuzzing missed. It’s possible fuzzing also finds bugs that this AI would miss.
 help



I would suggest watching Nicholas Carlini's talk and Heather Adkins and Four Flynn's talks from unprompted:

https://youtu.be/1sd26pWhfmg?si=onOai_ocxkZeNWP0

https://youtu.be/B_7RpP90rUk?si=HkRBhw95DbbKX9lL

My takeaway is that fuzzing is not just complementary, it also gives a stronger AI a starting point. But AI is generally faster and better.


Thanks - these talks are mindblowing. Highly recommended.

Different methods find different things. Personally, I'd rather use a language that is memory safe plus a great static analyzer with abstract interpretation that can guarantee the absence of certain classes of bugs, at the expense of some false positives.

The problem is that these tools, such as Astrée, are incredibly expensive and therefore their market share is limited to some niches. Perhaps, with the advent of LLM-guided synthesis, a simple form of deductive proving, such as Hoare logic, may become mainstream in systems software.


This line of reasoning makes no sense when the AI can just be given access to a fuzzer. I would guess that it probably did have access to a fuzzer to put together some of these vulnerabilities.

Carlini talked about that a fair amount in the context of pairing the two: e.g. many protocols are challenging for fuzzers because they have something like a checksum or signature but LLMs are good at coming up with harnesses for things like that. I’m sure that we’re going to see someone building an integrated fuzzer soon which tries to do things like figure out how to get a particular branch to follow an unexercised path.

AI can initate the fuzzing and optimize the process of fuzzing.

This is obviously just cope (there's a long, strong-form argument for why LLM-agent vulnerability research is plausibly much more potent than fuzzing, but we don't have to reach it because you can dispose of the whole argument by noting that agents can build and drive fuzzers and triage their outputs), but what I'd really like to understand better is why? What's the impetus to come up with these weird rationalizations for why it's not a big deal that frontier models can identify bugs everyone else missed and then construct exploits for them?

I don't have an anti-AI stance. Maybe I should have spelled that out more clearly in my comment above. I'm as excited and terrified by this technology as everyone else. I think we're all in vicious agreement that we need defense-in-depth - including LLMs and fuzzing (and static analysis and so on).

An LLM can guide all of this work, but current models tend to slowly go off the rails if you don't keep a hand on the wheel. I suspect this new model will be the same. I've had Opus4.6 write custom fuzzing tools from scratch, and I've gotten good results from that. But you just know people will prompt this new model by saying "make this software secure". And it'll forget fuzzing exists at all.


Good lord, why such a virulent response to something that seems like we should be considering?

As someone in cybersecurity for 10+ years my immediate assumption is why not both? I don’t think considering that they could both have their uses is “cope”.


Again: LLM agents already are both. But it's also remarkable and worth digging into the fact that LLM agents haven't needed fuzzers to produce many (any? in Anthropic Red's case?) of the vulnerabilities they're discussing.

Do we know that? I'd love to see some of the ways security researchers are using LLMs. We have no idea if claude was using fuzzing here, or just reading the files and spotting bugs directly in the source code.

A few weeks ago someone talked about their method for finding bugs in linux. They prompted claude with "Find the security bug in this program. Hint: It is probably in file X.". And they did that for every file in the repo.


> Since then, this weakness has been missed by every fuzzer and human who has reviewed the code, and points to the qualitative difference that advanced language models provide. [^1]

> At no point in time does the program take some easy-to-identify action that should be prohibited, and so tools like fuzzers can’t easily identify such weaknesses. [^2]

[^1]: https://red.anthropic.com/2026/mythos-preview/#:~:text=Since...

[^2]: https://red.anthropic.com/2026/mythos-preview/#:~:text=At%20...


Are you saying that LLMs can use fuzzers or are you saying that they work like fuzzers? Because one of those is less…deterministic? Then the other.

Regardless and in the spirit of my original response my answer would be to give the LLM access to a fuzzer (plus other tools etc) but also have fuzzers in the pipeline. Partially because that increases the determinism in the mix and partially because why not? Layering is almost always better than not.

But again more than anything I’m focusing on the accusations of cope. People SHOULD have measured reactions to claims about any product. People SHOULD be asking questions like this. I know that the LLM debate is often “spicy” but man let’s just try to lower the temperature a bit yeah?


LLMs can use fuzzers and also LLMs can explore the semantic space of a program in ways fuzzers can't.

You said it yourself. It's cope. That's all it is and all it ever was.

https://en.wikipedia.org/wiki/AI_effect

Every time an AI does something new, there's a human saying "it's not really doing that something", "it's doing that something in a fake way" or "that something was never important in the first place".


Alright, except that’s not what I was saying. I was just pointing out that LLMs don’t replace fuzzing or static analysis. They complement those techniques. And yes, LLMs may drive those techniques directly, but they often don’t. At least not yet.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: