Can you give a few penciled numbers?

paulddraper · 2026-03-30T13:53:41 1774878821

You can rent a H100 GPU for $4/hour. [1]

300k tokens for that hour.

OpenAI charges $6.

Those are pessimistic assumptions.

[1] https://lambda.ai/instances

hajile · 2026-03-30T16:22:39 1774887759

Can you keep that GPU 100% saturated at least 16 hours per day every day of the week?

If not, you aren't breaking even.

paulddraper · 2026-03-30T18:50:34 1774896634

Note this is also assuming you

(1) Rent your GPUs.

(2) Pay list price, no volume breaks.

(3) Get only 85 tokens/sec. Realistically, frontier models would attain 200+ tokens/second amortized.

Inference is extremely profitable at scale.

aurareturn · 2026-03-30T19:46:54 1774900014

Assuming 80GB H100 and you inference a model that is MoE and close to the size of the 80GB VRAM, you're going to see around 10k tokens/second fully batched and saturated. An example here might be Mixtral 8x7B.

You're generating about 36 million tokens/hour. Cost of Mixtral 8x7b on Open router is $0.54/M input tokens. $0.54/M output tokens.

You're looking at potentially $38.88/hour return on that H100 GPU. This is probably the best case scenario.

In reality, inference providers will use multiple GPUs together to run bigger, smarter models for a higher price.

drakythe · 2026-03-30T14:07:06 1774879626

3.99 at 8x instances, with a minimum 2 week commitment. Good luck getting 70% usage average during that time. Useful when you're running a training round and can properly gauge demand, not so great when you're offering an API.

infecto · 2026-03-30T14:14:33 1774880073

Is it not a good penciled number? It helps set the directional tone that at inference cost is being covered.

drakythe · 2026-03-30T14:29:14 1774880954

It says the numbers are theoretically possible. Requiring a 66% usage to break even when 100% usage will piss off customers by invoking a queue means it’s a balancing act.

“Technically correct. The best kind of correct”. So inference may technically be _capable_ of being profitable, but I have question’s about them being profitable in _practice_.