Hacker Newsnew | past | comments | ask | show | jobs | submit | greenknight's commentslogin

The thing is, it doesnt need to beat 4.7. it just needs to do somewhat well against it.

This is free... as in you can download it, run it on your systems and finetune it to be the way you want it to be.


> you can download it, run it on your systems

In theory, sure, but as other have pointed out you need to spend half a million on GPUs just to get enough VRAM to fit a single instance of the model. And you’d better make sure your use case makes full 24/7 use of all that rapidly-depreciating hardware you just spent all your money on, otherwise your actual cost per token will be much higher than you think.

In practice you will get better value from just buying tokens from a third party whose business is hosting open weight models as efficiently as possible and who make full use of their hardware. Even with the small margin they charge on top you will still come out ahead.


There are a lot of companies who would gladly drop half a million on a GPU to have private inference that Anthropic or OpenAI can’t use to steal their data.

And that GPU wouldn’t run one instance, the models are highly parallelizable. It would likely support 10-15 users at once, if a company oversubscribed 10:1 that GPU supports ~100 seats. Amortized over a couple years the costs are competitive.


> There are a lot of companies who would gladly drop half a million on a GPU to have private inference that Anthropic or OpenAI can’t use to steal their data.

Obviously, and certainly companies do run their own models because they place some value on data sovereignty for regulatory or compliance or other reasons. (Although the framing that Anthropic or OpenAI might "steal their data" is a bit alarmist - plenty of companies, including some with _highly_ sensitive data, have contracts with Anthropic or OpenAI that say they can't train future models on the data they send them and are perfectly happy to send data to Claude. You may think they're stupid to do that, but that's just your opinion.)

> the models are highly parallelizable. It would likely support 10-15 users at once.

Yes, I know that; I understand LLM internals pretty well. One instance of the model in the sense of one set of weights loaded across X number of GPUs; of course you can then run batch inference on those weights, up to the limits of GPU bandwidth and compute.

But are those 100 users you have on your own GPUs usings the GPUs evenly across the 24 hours of the day, or are they only using them during 9-5 in some timezone? If so, you're leaving your expensive hardware idle for 2/3 of the day and the third party providers hosting open weight models will still beat you on costs, even without getting into other factors like they bought their GPUs cheaper than you did. Do the math if you don't believe me.


There's stuff like SOC controls and enterprise contracts with enforceable penalties if clauses are breached. ZDR is a thing.

The most significant value of open source models come from being able to fine-tune; with a good dataset and limited scope; a finetune can be crazily worth it.


Sure, but that’s an incredibly short term viewpoint.

Do you think a lot of people have “systems” to run a 1.6T model?

To me, the important thing isn't that I can run it, it's that I can pay someone else to run it. I'm finding Opus 4.7 seems to be weirdly broken compared to 4.6, it just doesn't understand my code, breaks it whenever I ask it to do anything.

Now, at the moment, i can still use 4.6 but eventually Anthropic are going to remove it, and when it's gone it will be gone forever. I'm planning on trying Deepseek v4, because even if it's not quite as good, I know that it will be available forever, I'll always be able to find someone to run it.


Yep, it's wild how little emphasis is there on control and replicability in these posts.

Already these models are useful for a myriad of use cases. It's really not that important if a model can 1-shot a particular problem or draw a cuter pelican on a bike. Past a degree of quality, process and reliability are so much more important for anything other than complete hands-off usage, which in business it's not something you're really going to do.

The fact that my tool may be gone tomorrow, and this actually has happened before, with no guarantees of a proper substitute... that's a lot more of a concern than a point extra in some benchmark.


No, but businesses do. Being able to run quality LLMs without your business, or business's private information, being held at the mercy of another corp has a lot of value.

What type of system is needed to self host this? How much would it cost?

One GB200 NVL72 from Nvidia would do it. $2-3 million, or so. If you're a corporation, say Walmart or PayPal, that's not out of the question.

If you want to go budget corporate, 7 x H200 is just barely going to run it, but all in, $300k ought to do it.


How many users can you serve with that?

For the H200, between 150-700. The GB200 gets you something like 2-10k users.

Whoa. How on earth can one system serve 2000 potentially concurrent users?

Depends how many users you have and what is "production grade" for you but like 500k gets you a 8x B200 machine.

Depends on fast you want it to be. I’m guessing a couple of $10k mac studio boxes could run it, but probably not fast enough to enjoy using it.

$20K worth of RTX 6000 Blackwell cards should let you run the Flash version of the model.

Not really - on prem llm hosting is extremely labor and capital intensive

But can be, and is, done. I work for a bootstrapped startup that hosts a DeepSeek v3 retrain on our own GPUs. We are highly profitable. We're certainly not the only ones in the space, as I'm personally aware of several other startups hosting their own GLM or DeepSeek models.

Why a retrain? What are you using the model for?

Completely agree, not suggesting it needs ot just genuinely curious. Love that it can be run locally though. Open source LLMs punching back pretty hard against proprietary ones in the cloud lately in terms of performance.

What's the hardware cost to running it?

I was curious, and some [intrepid soul](https://wavespeed.ai/blog/posts/deepseek-v4-gpu-vram-require...) did an analysis. Assuming you do everything perfectly and take full advantage of the model's MoE sparsity, it would take:

- To run at full precision: "16–24 H100s", giving us ~$400-600k upfront, or $8-12/h from [us-east-1](https://intuitionlabs.ai/articles/h100-rental-prices-cloud-c...).

- To run with "heavy quantization" (16 bits -> 8): "8xH100", giving us $200K upfront and $4/h.

- To run truly "locally"--i.e. in a house instead of a data center--you'd need four 4090s, one of the most powerful consumer GPUs available. Even that would clock in around $15k for the cards alone and ~$0.22/h for the electricity (in the US).

Truly an insane industry. This is a good reminder of why datacenter capex from since 2023 has eclipsed the Manhattan Project, the Apollo program, and the US interstate system combined...


All these number are peanuts to a mid sized company. A place I worked at used to spend a couple million just for a support contract on a Netapp.

10 years from now that hardware will be on eBay for any geek with a couple thousand dollars and enough power to run it.


That article is a total hallucination.

"671B total / 37B active"

"Full precision (BF16)"

And they claim they ran this non-existent model on vLLM and SGLang over a month and a half ago.

It's clickbait keyword slop filled in with V3 specs. Most of the web is slop like this now. Sigh.


Probably like 100 USD/hour

"if you have to ask..."

... if you have 800 GB of VRAM free.

I remember reading about some new frameworks have been coming out to allow Macs to stream weights of huge models live from fast SSDs and produce quality output, albeit slowly. Apart from that...good luck finding that much available VRAM haha

On the flip side, i had some pain in my chest... RUQ (right upper quadrant for those medical folk).

On the way to the hospital, ChatGPT was pretty confident it was a issue with my gallbladder due to me having a fatty meal for lunch (but it was delicious).

After an extended wait time to be seen, they didnt ask about anything like that, and at the end they were like anything else to add, added it in about ChatGPT / Gallbladder... discharged 5 minutes later with suspicion of Gallbladder as they couldnt do anything that night.

Over the next few weeks, got test after test after test, to try and figure out whats going on. MRI. CT. Ultrasound etc.etc. they all came back negative for the gallbladder.

ChatGPT was persistant. It said to get a HIDA scan, a more specialised scan. My GP was a bit reluctant but agreed. Got it, and was diagnosed with a hyperkinetic gallbladder. It is still unrecognised as an issue, but mostly accepted. So much so my surgeon initally said that it wasnt a thing (then after doing research about it, says it is a thing)... and a gastroentologist also said it wasnt a thing.

Had it taken out a few weeks ago, and it was chroically inflammed. Which means the removal was the correct path to go down.

It just sucks that your wife was on the other end of things.


This reminds me of another recent comment in some other post, about doctors not diagnosing "hard to diagnose" things.

There are probably ("good") reasons for this. But your own persistence, and today the help of AI, can potentially help you. The problem with it is the same problem as previously: "charlatans". Just that today the charlatan and the savior are both one and the same: The AI.

I do recognize that most people probably can't tell one from the other. In both cases ;)

You'll find this in my post history a few times now but essentially: I was lethargic all the time, got migraine type headaches "randomly" a lot. Having the feeling I'd need to puke. One time I had to stop driving as it just got so bad. I suddenly was no longer able to tolerate alcohol either.

I went to multiple doctors, was sent to specialists, who all told me that they could maaaaaybe do test XYX but essentially: It wasn't a thing, I was crazy.

Through a lot of online research I "figured out" (and that's an over-statement) that it was something about the gut microbiome. Something to do with histamine. I tried a bunch of things, like I suspected it might be DAO (Di-Amino-Oxidase) insufficiency. I tried a bunch of probiotics, both the "heals all your stuff" and "you need to take a single strain or it won't work" type stuff. Including "just take Actimel". Actimel gave me headaches! Turns out one of the (prominent) strains in there makes histamine. Guess what, Alcohol, especially some, has histamines and your "hangover" is also essentially histamines (made worse by the dehydration). And guess what else, some foods, especially some I love, contain or break down into histamines.

So I figured that somehow it's all about histamines and how my current gut microbiome does not deal well with excess histamines (through whichever source). None of the doctors I went to believed this to be a "thing" nor did they want to do anything about it. Then I found a pro-biotic that actually helped. If you really want to check what I am taking, check the history. I'm not a marketing machine. What I do believe is that one particular bacterium helped, because it's the one thing that wasn't in any of the other ones I took: Bacillus subtilis.

A soil based bacterium, which in the olden times, you'd have gotten from slightly not well enough cleaned cabbage or whatever vegetable du jour you were eating. Essentially: if your toddler stuffs his face with a handful of dirt, that's one thing they'd be getting and it's for the better! I'm saying this, because the rest of the formulation was essentially the same as the others I tried.

I took three pills per day, breakfast, lunch and dinner. I felt like shit for two weeks, even getting headaches again. I stuck with it. After about two weeks I started feeling better. I think that's when my gut microbiome got "turned around". I was no longer lethargic and I could eat blue cheese and lasagna three days in a row with two glasses of red wine and not get a headache any longer! Those are all foods that contain or make lots of histamine. I still take one per day and I have no more issues.

But you gotta get to this, somehow, through all of the bullshit people that try to sell you their "miracle cure" stuff. And it's just as hard as trying to suss out where the AI is bullshitting you.

There was exactly a single doctor in my life, who I would consider good in that regard. I had already figured the above one out by that time but I was doing keto and it got all of my blood markers, except for cholesterol into normal again. She literally "googled" with me about keto a few times, did a blood test to confirm that I was in ketosis and in general was just awesome about this. She was notoriously difficult to book and later than any doctor for schedules appointments, but she took her time and even that would not really ever have been enough to suss out the stuff that I figured out through research myself if you ask me. While doctors are the "half gods in white", I think there's just way too much stuff and way too little time for them. It's like: All the bugs at your place of work. Now imagine you had exactly one doctor across a multitude of companies. Of course they only figure out the "common" ones ...


One challenge that may sound obvious.. is that super rare stuff gets seen super rarely, even by specalists.

In practice it means you often have to escalate from GP to local specialist to even more narrow specialist all the way to one of the regional big city specialist that almost exclusively get the weird cases.

This is because every hop is an increasingly narrow area of speciality.

Instead of just “cancer doctor” its the “GI cancer doctor” then its “GI cancer doctor of this particular organ” then its “an entire department of cancer doctors who work exclusively on this organ who will review the case together”, etc.


It's horses not zebras until it's actually a zebra and your life depends on it. I think those sorts of guidelines are useful in the general case. But many medical issues quickly move beyond the general case and need closer examination. Not sure how you do that effectively without wasting tons of money on folks with indigestion.


Interesting to read, thank you very much. Are you still eating ketogenic? The bacillus subtilis seems to metabolize glucose, so are yours still alive? And did you try other probiotica beforehand? I am having HIT and eating a mostly carnivore diet with mostly fresh/unfermented meat.


I no longer do keto no. I also started keto after I had gotten better already from the probiotics but not much. I'm not sure where you read about that subtilis can only live off of glucose. I'm having a hard time finding primary sources that actually talk about this but handily Google's "AI mode" also "answered" my search query and it does state it primarily thrives on glucose and sugars but can also break down and live off of proteins and fats.

FWIW, as I understand it, many probiotics aren't going to colonize on their own and "stick around" for a prolonged period of time when you stop taking them, even under good circumstances but you can't quote me on that so to speak. And in the past we would've gotten many of them through one way or another through our diet as well, just not through a probiotic but naturally.

I tried multiple probiotics. Both blends of multiple types as well as things like "Saccharomyces Boulardii"-only preparation. I don't recall all the exact ones I tried though.


after reading your comment, my perception is mixed


If it was inflamed would your GGT level be high?


Love the AI generated image, "Extra-Large (>72pt) Font". Made me chuckle. Definetly a feature of the app.

But looks really cool ill be trialing it this week!


Send me an email (see my profile) if you want to try the custom program creation. I’m starting the beta this week.


AMD does do semi-custom work.

Whats to stop sony being like we dont want UDNA 1, we want a iteration of RDNA 4.

For all we know, it IS RDNA 5... it just wont be available to the public.


And their half step/semi-custom work can find their way back to APUs. RDNA 3.5 (the version marketed as such) is in the Zen 5 APUs with Mobile oriented improvements. It wouldn’t surprise me if a future APU gets RDNA 5. GCN had this sort of APU/Console relationship as well.


Also steamdeck before the OLED version and magic leap 2 shared a custom chip, with some vision processing parts fused off for steamdeck.


This is in Melbourne, where most homes are sold via auction (because of the limited supply)... lots of people are forgoing building inspections because of it. Wouldnt be surprised if he didnt do one.


It could be sold as an "as-is", intended for knockdown/ demolition and replacement, or removal. That's why the 20% minimum open space requirements for new homes.


Claude Code Pro ($17 per month) now supports it, just ends earlier


Pro does not support Claude Code, the agent. The docs say it does, but it wouldn't work yesterday when I actually tried it.


it absolutely does, as of 18 hours ago or so. the docs were out of date wrt reality for a few hours at least.


Hah ok. I then just happened to try it out during their feature rollout.


Claude Code was added to the Pro tier in the last day or two; they've been working out some kinks with it


The other thing is that this is supposed to sit in between 3d rendering and compositing with regards to VFX

The 3D render, in an ideal world, is super smooth without imperfections.

The compositing, would take the denoised 3d render, and add other imperfections such as film grain, bloom, and other post effects.


I have never had Blender take 1:52 to open... ill open it now cold. 5 seconds.


If you're wondering about the source of the benchmark, refer to the following, page 36:

https://www.vulkan.org/user/pages/09.events/vulkanised-2025/...


Perhaps he means opening a large scene with lots of textures or shaders or something.


That's exactly what it is. From the presentation link posted in another comment:

Test: Start Blender, open scene, final viewport 3000 objects, 550 materials, 200 images


In that case, I’d seriously question the 6 sec vulkan startup time. Since the bottlenecks would be IO speed and memory availability, not processor speed. Certainly not GPU speed. Assuming assets with, say, 8k textures and millions of verts per mesh. All compounded by the scene containing, say thousands of said meshes.

Time in Vulkan or OGL would barely register against IO and memory when loading such scenes.


But isn't one advantage of Vulkan over OpenGL that it's more standard to upload data such as large meshes and textures from multiple threads ?


But you do that in OGL in any case. Most software dealing with that large an asset base does so using the old fashioned shared context style threading.

It’s all gonna come down to how fast is your IO and how fast is your memory. 6 secs to load all that data, be it via vulkan style threaded submissions or OGL style threading via shared context and syncing, is not likely to happen on most laptops. Even workstations may struggle to read in and generate the necessary data. Most of the reading and generation stage is usually happening entirely outside of the gpu. All of the reading. And most of the generation. Both are going into main memory first.

Again, generation will be way faster than read in assets, but most of the assets would be getting read in. That’s IO before you even make a single vulkan or OGL call. No way around it.


Blender 4.2.0 opens to splash screen in about 500ms on my laptop. I have no idea what that number could mean.


Wonder if they pulled that from some biased telemetry.


It's probably a random text generator. Blender opens for me instantly, as it always did, since the first time I ran it on a 75 mhz Pentium.


Correct... But from my understanding... OpenXR isnt reliant on OpenGL? it supports Vulkan, DirectX and metal -- https://github.com/godotengine/godot/pull/98872


FFMPEG is another. Any online video platform probably uses it.


Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: