Hacker Newsnew | past | comments | ask | show | jobs | submit | pertymcpert's commentslogin

> Mozilla found and fixed 271 vulnerabilities in Firefox 150 while testing Mythos Preview—over ten times more than they found in Firefox 148 with Claude Opus 4.6

4.6 but close.


Right, but were they using the same methodology and harness? I'm skeptical that they're doing something with the harness - i.e. with Mythos, they pass each file in one at a time, whereas on 4.6 they let Claude Code run loose to find bugs. This would have a larger impact difference than the model itself.

Yes, the harness they used actually existed and was in use beforehand, it wasn't developed for testing with Mythos.

From Mozilla post [1]:

"...After fixing the initial set of issues that Anthropic sent to us in February, we built our own harness atop our existing fuzzing infrastructure.

We began with small-scale experiments prompting the harness to look for sandbox escapes with Claude Opus 4.6. Even with this model, we identified an impressive amount of previously-unknown vulnerabilities which required complex reasoning over multiprocess browser engine code..."

So yeah, Anthropic and Mozilla likely compare "Amount of bugs found by Opus 4.6 during early experiments" vs "Amount of bugs found by Mythos during large-scale codebase scanning".

[1] https://hacks.mozilla.org/2026/05/behind-the-scenes-hardenin...


Universities...for a data center?

That's already factored in the cost of doing business for them.

These people don’t work for Apple or Anthropic.

Have to disagree as a father. The real benefit is the father and child who are now bonding. That doesn't mean the mother can't also bond, it just means it's not one sided.


I got to spend a bit more than 2 years doing math homework 1:1 with my youngest. Now, she's moving up to honors & gets 100% without any help. I miss all that time we got to hang out, do homework, watch videos of cats, etc.


The mother's are now working. So they're bonding less. I think that's what he means not that father's are taking away mother-child bonding time.


Except if anyone bothered to read the damn article you'd see that the research showed the highly educated were more likely to have involved fathers. Those are not going to be forced as the person seems to imply.


No that's not why.


I expect crickets to your response.


Skill issue in thinking.


I really don't understand how you don't understand how your site is completely misleading. Everyone here is telling you that including API reliability in with actual model performance is nonsense.


I agree that it's confusing, I have already implemented a reliability score, but it will only apply for new tests from now on.

I have already re-tested DeepSeek v4, so it doesn't have any API error issues.

API errors are quite rare, most models tested have usually max 1 API Error failure reason, so fixing them won't change rankings much: https://aibenchy.com/fail/api-error/

I will try to retest all with API errors, so the score is only given by correct/wrong answers, and the reliability score will be an extra metric just as an indication of how the API performs.

That being said, the reliability of the API is still a huge factor for production use-cases.


Some API errors are actually not about reliability, are because that specific API doesn't support some common features (e.g. specific structured_output formats).


Carbon offsetting is nothing to do with river pollution.


Carbon offsetting is risky. You plant a tree and you don’t know if it will die. You create a swampy area to absorb co2 and 10 years later it dries out due to global warming. Offsetting should be used if there is no other way to reduce emissions in the first place. Same is true for sucking carbon out of the air and storing it somewhere… it’s expensive and it should not be the default - we need offsetting and carbon segregation for the really unavoidable stuff


Sucking carbon out of the air using fully renewable energy (solar/wind) is a great thing to do! ... once we've fully decarbonized all other energy use and we have extra, left-over renewable energy.


Cool, but nothing to do with this conversation.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: