Given that many of the images look to be derived from Wizards of the Coast's Magic: The Gathering Universes Beyond Lord of the Rings set without attribution, I'd expect them to be getting a strongly-worded letter from Hasbro's lawyers any moment now (and for once, I'd support that).
I feel like they're quite open about why they think the benchmark doesn't work anymore:
> We also found evidence that models that have seen the problems during training are more likely to succeed, because they have additional information needed to pass the underspecified tests.
> This means that improvements on SWE-bench Verified no longer reflect meaningful improvements in models’ real-world software development abilities. Instead, they increasingly reflect how much the model was exposed to the benchmark at training time.
They're pointing out that run-rate revenue is based on essentially sampling revenue over some limited time interval, then extrapolating from there assuming revenue always occurs at the same rate (or greater) over all similar intervals in the future. More specifically, they're pointing out that estimates of ARR derived from this kind of sampling are fundamentally prone to error and can be arbitrarily inflated based on how the time interval is sampled.
Of course, but the fact of the matter is that the same technique was used for the quarter prior to that, and there’s a 3x increase quarter over quarter.
As far as I understand run rate revenue is just a fancy way of saying that "the last month we had sales, and if that continues for a year we will have a AAR of 30B. meaning it's not 30B yet, but the sales numbers indicates that we get there by continue selling at the current speed. But to have revenue of $100 and get $30B in ARR I guess the period looked at needs to be seconds....
(Run Rate = Revenue in Period / # of Days in Period x 365)
Not even that. It's not based on actual sales in, for example, the past month. It's based on an expected continuous growth based on the growth of the past month (or whatever period you pick).
I don't follow Anthropic closely enough to know what claims its CEO has made, but it is factual that Altman is a pathological liar. You can observe this for yourself by reading and listening to the things he says and then comparing them to reality. We have years of evidence to look back on. The chasm between Altman's reality and everyone else's is so large and so well-known that it was one of the chief factors cited by the board when he was fired.
(I would then argue that he was re-hired specifically because others involved with OpenAI understood that it is literally his job to lie and that OpenAI would not be where it is today as a corporate behemoth rather than a research non-profit without a world-class liar marketing it, but that is merely conjecture.)
I mean.. kinda everything about Mythos for example? Anthropic has a good product, but they also pretty consistently say some stupid ass shit if you're being generous, and blatant lies if you aren't
Oh yeah, I remember when they abandoned it for years, third party servers revived it, Blizzard realized they can make money off it and shut the third party servers down.
I run Windows 11 as my main desktop (and use Mac at work and have a bunch of servers / NAS where I run debian), and W11 is not painful at all.
I installed the Professional edition, disabled a few settings that I don't like the first time I installed it, and haven't had any issue or friction since then.
Meanwhile I'm constantly frustrated at MacOS and obviously you can't do anything on Linux without running into some sort of trouble.
Some people don't mind ads on the radio, when they're watching a show, some people don't mind pop-ups, etc.
Now, if I search on win11 to start a program (which is what they want you to do), why does auto complete call out to the Internet? Users had had browsers for over two decades, who has asked Microsoft to mix local search, application startup and web search?
As it turns out, I really hated on-call my whole career. I guess different personalities here, as well.
> if I search on win11 to start a program (which is what they want you to do), why does auto complete call out to the Internet?
This is how I launch most of my programs and it has literally never been an issue for me, it always does exactly what I expect, which is to launch the program I have installed locally.
I don't give a fuck if it makes an HTTP call or performs an incantation to the god of search in the background, as long as it just gives me my locally installed software instantly, which it does.
> As it turns out, I really hated on-call my whole career.
Sometimes its not instant though.. sometime web search completes faster then locally installed software and because of muscle memory I accidentally opened a bing page about scrcpy (where the first result is an unaffiliated web page instead of official github page!) instead of my locally installed scrcpy
Nope. Today I keyboard-shortcut opened search and tried to search for device manager - no no, it starting auto-completing on browser search (luckily I'd made it Brave at least). On Android you can easily turn that off.
I ended up having to go to Run and type the .msc of device manager - obviously spelled with a shortened version that had to be googled as somehow THAT didn't auto-complete.
Windows 11 needs to be exposed for what it is, a spyware operating system. I had booted into it just to test whether a hardware device was at fault or a USB cable was, then dual-booted back into Linux.
I just tried and device manager appeared right after I had typed "de" only.
I don't know what you guys do with your computers.
I have literally never been proposed internet results. It actually doesn't even propose is as an option. I just tried typing "this is not an existing app" and it just shows that it couldn't find any result, and internet is not even a source I can select.
It's very easy to calculate the actual cost given they list the exact tokens used. If I take the AWS Bedrock pricing for Opus 4.6 1M context (because Anthropics APIs are subsidized and sold at a loss), here's what each costs:
Cache reads cost $0.31
Cache writes cost $105
Input tokens cost $0.04
Output tokens cost $28.75
The total spent in the session is $134.10, while the Pro Max 5x subscription is $100.
Even taking the Anthropics API pricing, we arrive at $80.58. Below the subscription price, but not by much.
It's just the end of the free tokens, nothing to see here. It's easy to feel like you're doing "moderate" or even "light" usage because you use so little input tokens, but those "agentic workflows" are simply not viable financially.
reply