Hacker Newsnew | past | comments | ask | show | jobs | submit | iLoveOncall's commentslogin

I can't imagine getting such a famous IP as The Lord of the Rings and doing AI slop for it.

Given that many of the images look to be derived from Wizards of the Coast's Magic: The Gathering Universes Beyond Lord of the Rings set without attribution, I'd expect them to be getting a strongly-worded letter from Hasbro's lawyers any moment now (and for once, I'd support that).

Only if you didn't read the article.

They're saying they need to move on from it because the benchmark is flawed (without bringing in proof) and that's why they can't hit 100%.

It's not a "our models are so good that the benchmark is too easy" thing.


I feel like they're quite open about why they think the benchmark doesn't work anymore:

> We also found evidence that models that have seen the problems during training are more likely to succeed, because they have additional information needed to pass the underspecified tests.

> This means that improvements on SWE-bench Verified no longer reflect meaningful improvements in models’ real-world software development abilities. Instead, they increasingly reflect how much the model was exposed to the benchmark at training time.


> without bringing in proof

Did we read the same article?


How can you say “without bringing in proof” when there is literally proof in the article?

Run-rate revenue is not ARR. For all we know they could have a revenue of $100 and claim a run-rate revenue of $30B.

Given the fact that both Altman and Amodei are pathological liars, there's absolutely no reason to believe that Anthropic has $30B ARR.


For all we know they could have a revenue of $100 and claim a run-rate revenue of $30B.

Can you explain how that’d work? What would the $30B figure be based on if they only have $100 in revenue?


They're pointing out that run-rate revenue is based on essentially sampling revenue over some limited time interval, then extrapolating from there assuming revenue always occurs at the same rate (or greater) over all similar intervals in the future. More specifically, they're pointing out that estimates of ARR derived from this kind of sampling are fundamentally prone to error and can be arbitrarily inflated based on how the time interval is sampled.

Of course, but the fact of the matter is that the same technique was used for the quarter prior to that, and there’s a 3x increase quarter over quarter.

As far as I understand run rate revenue is just a fancy way of saying that "the last month we had sales, and if that continues for a year we will have a AAR of 30B. meaning it's not 30B yet, but the sales numbers indicates that we get there by continue selling at the current speed. But to have revenue of $100 and get $30B in ARR I guess the period looked at needs to be seconds....

(Run Rate = Revenue in Period / # of Days in Period x 365)


Not even that. It's not based on actual sales in, for example, the past month. It's based on an expected continuous growth based on the growth of the past month (or whatever period you pick).

It's a forecast.


I cant say what all companies does. But my google seaches and and chatgpt Do not agree with you on that. They stick to actual sales.

I have never heard of SaaS companies reporting ARR like this.

There are about 30 million seconds in a year. If they made $100 over the last hundred milliseconds, then that’s $30B annualized.

(That said, their numbers are much realer than that.)


If you make a hundred dollars in 0.1 seconds, you could say your annualized revenue is $100 / 0.1 * 60 * 60 * 24 * 365 = -$30 billion.

That said, most people would use a monthly or quarterly period to estimate ARR. I'm not sure what Anthropic used. Probably monthly.


the fact!?

I don't follow Anthropic closely enough to know what claims its CEO has made, but it is factual that Altman is a pathological liar. You can observe this for yourself by reading and listening to the things he says and then comparing them to reality. We have years of evidence to look back on. The chasm between Altman's reality and everyone else's is so large and so well-known that it was one of the chief factors cited by the board when he was fired.

(I would then argue that he was re-hired specifically because others involved with OpenAI understood that it is literally his job to lie and that OpenAI would not be where it is today as a corporate behemoth rather than a research non-profit without a world-class liar marketing it, but that is merely conjecture.)


I mean.. kinda everything about Mythos for example? Anthropic has a good product, but they also pretty consistently say some stupid ass shit if you're being generous, and blatant lies if you aren't

Stand clear of the blast crater not everyone in tech bought the con…

Porn and memes. Obviously. This is all that Stable Diffusion has been used for since it was released.

WoW vanilla is being sold right now by Blizzard themselves, under a subscription model.

Oh yeah, I remember when they abandoned it for years, third party servers revived it, Blizzard realized they can make money off it and shut the third party servers down.

> IMHO there is a point where incremental model quality will hit diminishing returns.

You mean a couple of years ago?


Congratulations, you just won the trophy for most privileged comment on the web!

We all know this is actually Mythos but called Opus 4.7 to avoid disappointments, right?

I run Windows 11 as my main desktop (and use Mac at work and have a bunch of servers / NAS where I run debian), and W11 is not painful at all.

I installed the Professional edition, disabled a few settings that I don't like the first time I installed it, and haven't had any issue or friction since then.

Meanwhile I'm constantly frustrated at MacOS and obviously you can't do anything on Linux without running into some sort of trouble.


Some people don't mind ads on the radio, when they're watching a show, some people don't mind pop-ups, etc.

Now, if I search on win11 to start a program (which is what they want you to do), why does auto complete call out to the Internet? Users had had browsers for over two decades, who has asked Microsoft to mix local search, application startup and web search?

As it turns out, I really hated on-call my whole career. I guess different personalities here, as well.


> if I search on win11 to start a program (which is what they want you to do), why does auto complete call out to the Internet?

This is how I launch most of my programs and it has literally never been an issue for me, it always does exactly what I expect, which is to launch the program I have installed locally.

I don't give a fuck if it makes an HTTP call or performs an incantation to the god of search in the background, as long as it just gives me my locally installed software instantly, which it does.

> As it turns out, I really hated on-call my whole career.

Read my bio.


Sometimes its not instant though.. sometime web search completes faster then locally installed software and because of muscle memory I accidentally opened a bing page about scrcpy (where the first result is an unaffiliated web page instead of official github page!) instead of my locally installed scrcpy

It's like when you're playing a video game and you accidentally press the wrong button and shoot a guy in the face instead of healing him.

The idea that you could accidentally open a webpage instead of launching a program is a UX embarrassment.


Nope. Today I keyboard-shortcut opened search and tried to search for device manager - no no, it starting auto-completing on browser search (luckily I'd made it Brave at least). On Android you can easily turn that off.

I ended up having to go to Run and type the .msc of device manager - obviously spelled with a shortened version that had to be googled as somehow THAT didn't auto-complete.

Windows 11 needs to be exposed for what it is, a spyware operating system. I had booted into it just to test whether a hardware device was at fault or a USB cable was, then dual-booted back into Linux.


I just tried and device manager appeared right after I had typed "de" only.

I don't know what you guys do with your computers.

I have literally never been proposed internet results. It actually doesn't even propose is as an option. I just tried typing "this is not an existing app" and it just shows that it couldn't find any result, and internet is not even a source I can select.


It's very easy to calculate the actual cost given they list the exact tokens used. If I take the AWS Bedrock pricing for Opus 4.6 1M context (because Anthropics APIs are subsidized and sold at a loss), here's what each costs:

Cache reads cost $0.31

Cache writes cost $105

Input tokens cost $0.04

Output tokens cost $28.75

The total spent in the session is $134.10, while the Pro Max 5x subscription is $100.

Even taking the Anthropics API pricing, we arrive at $80.58. Below the subscription price, but not by much.

It's just the end of the free tokens, nothing to see here. It's easy to feel like you're doing "moderate" or even "light" usage because you use so little input tokens, but those "agentic workflows" are simply not viable financially.


Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: