Hacker Newsnew | past | comments | ask | show | jobs | submit | eventualcomp's commentslogin

Where is $50k coming from again?

That’s less than the monthly salary of 10 software engineers, and assuming they pay API prices, probably earns itself back in about a year.

Having said that, I don’t think it’s all that tempting for companies at all, considering this whole market is developing rapidly and it’s nearly impossible to predict where we’ll be at in a year or two.


The hardware requirements aren't evolving and the local models have only been improving.

It's not like you'd lose capabilities, if anything this solution just gets better with time.


If the newer models require more/better hardware then you’ll lose capabilities.

I think you’re better off renting GPU instances and running all the software on those. It’ll be cheaper than Anthropic and OpenRouter but slightly more expensive than electricity and depreciation of hardware.


The newer models don't require more/better hardware. There's a small army of local llm enthusiasts who are running LLMs using 3090s and H100s because they have lots of memory. Them being old isn't really that big of an issue as the compute power needed is relatively low all things considered.

The number of parameters needed for these open weight models has mostly stabilized so the actual memory requirements aren't likely to change all that much.


Correct. The main bottleneck with LLM inference is, and have always been, memory bandwidth.

TPS = active weights in GB / your memory bandwidth.

That’s it for decode. That’s all.


As in who pays for it or how did I arrive at that number?

For who pays for it, obviously the employer would.

For "how did I arrive at this number" Ballpark estimate from what I know about part cost. Most of that money will go towards AI cards about $5k for the mb, cpu, power supply, etc. $45k would be for as much ram and as big/expensive nVidia cards as you can get your hands on. The B300 has 288GB of VRAM in it. Probably what you'd be after.


$50K seems low if you want to run, say, GLM 5.2 4bit fast enough for a team for devs.

You need something like 6x RTX Pro 6000 at $11800 each plus a nice server (add $10000) = $80800 and then quite a bit of electricity.


You don't need all of the model in VRAM. 1 or 2 RTX Pro 6000s will do. $50K will get you there very nicely, and on a 1600 watt PSU if you go for the MAX-Q versions. (The same wattage PSU I'm typing this on, and have been using over the last 5 years.)

If you want decent performance (more than say 20 tokens/s) for your dev team, you absolutely do need all of the model in VRAM.

I've heard of musicians with very strong senses of perfect pitch flocking to flute or oboe, because anything not keyed in C (perfect pitch equiv) results in too much cognitive dissonance. Clarinets are keyed in Bb (you play a C, out comes a Bb), horns in F (you play a C, out comes an F), trumpets in Eb (this should be clear), and so on...

Like motion sickness with musical tones - you see one thing on the page, you have a sense for what "note" you're playing, but out comes something else.

I have perfect pitch but it's not really useful, except for noticing that my instrument is getting sharper. But that doesn't matter since you have to be in tune with the rest of the band/orchestra.


>Clarinets are keyed in Bb (you play a C, out comes a Bb), horns in F (you play a C, out comes an F), trumpets in Eb (this should be clear), and so on...

In reality, you put your fingers in the position for a C on that specific instrument and you get a C. The name "transposing instrument" is misleading; the instrument itself does not transpose. It's purely a notation convention, intended to give you a consistent mapping between notation and fingering so it's easier to switch between instruments. If you only play one instrument there's no need for it. And even if you do, it's not strictly necessary, e.g. recorders are commonly available in both C and F and are conventionally not notated transposed. Professional players routinely switch between them for different pieces.

I expect it would be possible to train an image-processing LLM to OCR sheet music so it can be automatically transposed and re-engraved for compatibility with absolute pitch.


> In reality, you put your fingers in the position for a C on that specific instrument and you get a C.

OK, my fault for poor communication. Let me try strongly typing this.

Clarinet: you play a finger-C, out comes a soundwave-Bb. Flute: you play a finger-C, out comes a soundwave-C. And finger-C is polymorphic on the instrument, or something.

Aside from that, I don't disagree with you.

One consideration is that with most instruments, being keyed the way they are, if you immediately transpose via LLM some of those instruments will have almost all their notes in unexpected ledger lines.

Which could have (en)grave implications.


Reminds me of this youtube video: https://m.youtube.com/watch?v=jkdWzvMOPuo

I liked the comments explaining why this worked.


> Nothing will be able to stop you from pulling the plug.

Who is "you"? And what do you envision is the "plug" to pull? And when does an intelligence become a superintelligence - do "we" know when to pull the "plug"?

For example, a superintelligence may be born in a datacenter. Would you expect politicians are aligned with shutting down a datacenter (privately owned) in which they may have heavy stake? What if critical systems are also running on the same infrastructure, will it be easy to cherry pick the superintelligence nodes to shutoff?

IMO this take is dismissive of the entrenched systems that make it hard to pull the plug. It's a hard problem and we need to think about it more.


(Self-reply due to inability to edit) so just checked your user history - why do you keep posting provocative takes?

The fact that a datacenter is evaporating X gallons of water in a period implies that a datacenter is ingesting X gallons of water (if less, the datacenter dries out, if more the datacenter floods) - meaning X gallons are now locked out of the water cycle. Meaning it rains back down and gets slurped back up.

This is under the happy assumption that all used water evaporates into a cloud directly above the source region, which rains back directly.


How much water is contained inside the datacenter at any given moment? That's how much water is taken from circulation by this datacenter. Is it enough to worry?

There is also buffer for clean water required - Y gallons on-hand at the datacenter. You can see the other replies in this parent comment demonstrating the tight ongoing humidity requirements, and how clean water is sprayed onto the actual hardware to cool it off, and more. Evidently this can't be done by setting up a giant funnel above the datacenter to collect rainwater.

Given those considerations I expect Y to be pretty large.


> Perhaps most importantly, it does so using a tiny fraction of the CPU time, saving energy and keeping our datacenters (and planet) a little cooler.

But then:

> A decoder backend on AWS (SQS + Lambda + DynamoDB + S3) reassembles objects from incoming encoded packets delivered via Proxylity UDP Gateway.

:( those microservice invocation will burn up the DC more.

The real sell looks like offloading s3 upload latency.


After having used it a while I tend to agree — the snappy send time is such a nice change.

Maybe at some point S3 will have a native implementation of something like this.


Praise be the accountability sink. https://news.ycombinator.com/item?id=41891694


To use an analogy, to add to everybody else: it's like rings on a tree stump. The innermost part of the stump is the oldest; the outer the youngest. Earth is on one of those in-between rings, neither the oldest nor the newest - doesn't matter which of the in-betweens, to be honest.

Suppose now that you're an ant on the middle ring of that tree stump. No matter which way you're looking from Earth's middle-ring, either the rings will get gradually older and then younger with increasing distance (if you're looking towards the center-ish), or the rings will get strictly younger (if you're looking away from the center-ish).

This analogy obviously breaks down if you delve into details but that should give a better intuition to what's going on.


> Would love feedback from people working on long-running agents, training loops, eval harnesses, or similar workflows.

I have not required a service for this kind of optimization at work. Though work gives me unbounded access to Claude 4.x-1m (substitute x with whatever is available). So I often ask it to do this kind of task.

I found that when I just specify, sometimes the AI will optimize to the point that it breaks other existing functional requirements in the same codebase. So I have to steer it with invariants. This is where the bulk of my effort is - monitoring to make sure that the agent didn't suddenly scramble the infra or delete valid usecases.

1. How do you address that [paperclip problem](https://en.wikipedia.org/wiki/Instrumental_convergence) in Remoroo? Can we define invariants? 2. Why is there a whole orchestration system? Was there some limitation that prompted this architecture, e.g. did workers die frequently? Looks like Temporal/AWS SWF with the brain/worker/control architecture. The existence of `q (quit): Kills the Worker, aborts the run. The run is marked FAILED.` makes me think there's only one worker...so why...? It'd make more sense if the brain wanted to dispatch multiple hypotheses to multiple workers to test in parallel (e.g. if optimizing SQL, try these different joins all at once, discard queries running for X minutes after the first complete one).


When I saw the title, I actually didn't have much emotion beyond curiosity. But then after checking thus comment, it piqued my interest, made me step back and really consider the ramifications of how we got here. And then yes I became depressed also.

Anyway, I got value out of it, comments dont have to increase net factual information to be meaningful, because we are all capable of reflection.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: