Given my experience with hosting these models at scale, working and optimizing load, I don't think the margins are nearly as high as 75% if the models are as big as people often claim.
Only reason deepseek is so cheap is because well I don't know, but actual pricing should be around their initial price which was 4x, at that price you have a healthy 25-50% margin based on occupancy, given the deepseek v4 is a very sparse moe model.
GLM 5.2 for example doesn't have more than 30-50% margins that's assuming old pricing for GPUs, current inflated GPU pricing well I am certain the margins must be lower.
Ofc you can host for cheaper with quantization, and if you have very consistent capacity/utilization, which is not the norm with AI workloads.
Overall for large models like GPT 5.5 or Opus there must be healthier margins of around 50-70% assuming GPU pricing didn't increase for these companies. Even if it did 30-40% margin should be possible, even in worst case assuming all GPU they had saw a jump in pricing.
For smaller models it's hard to say, I would guess 20% but these models might be much smaller than I suspect, then it might be double that.
Note the issue is less intelligent tokens don't linearly scale down in memory usage, which is the biggest pain point of serving models. Context sizes have fucked us all.
Also anyone claiming OAI makes less margins on APIs or stuff might be wrong given they are on much lower context size, 1M context definitely is a lot more expensive to serve especially with smaller models like sonnet.
Not true, polymarket and kalshi have a lot of system bets, basically bets where the other side is the market, and further they take a cut from the money gambled.
There are other things as well, like they control which side wins, so they can leverage trade or even swing trade internally with a few bot accounts, not saying that they do but I don't see a ToC that says they can't.
Polymarket can market-make, but so can you (i.e be on the both sides of the order book). You can't do that in a casino for instance.
They don't control which side wins (UMA voters do). You should read up about prediction markets and in general sport gambling. Sport gambling platforms with a "house" usually derive their odds from pvp markets. Polymarket is similar to a classic order book market.
1. Kalshi doesn't use UMA.
2. Yeah spend more money to get UMA votes to decide controversial decisions. That's how polymarket works.
I have a lot of experience with similar systems, that's what we call rigging the election.
Imagine a handful of anonymous UMA wallets dominate Polymarket’s contested resolutions, and reporters have linked some deciding wallets to positions in the very markets they judged.
> Polymarket can market-make, but so can you (i.e be on the both sides of the order book). You can't do that in a casino for instance.
That's like saying Elon can be a trillionaire, but so can you, I mean sure, anyone can, but how lucky do you think you have to be for your market to actually get any meaningful traction and for you to make any money. And even when you do, you are basically suggesting you should insider trade or rig bets to make money... I am certain that might be a flex for some but man I don't think that's the dunk you were looking for.
At the end the only folks making money are the markets, but who am I to stop the blind. Best of luck, hopefully you don't regret it. I only hope people don't get addicted or robbed by this stuff, rich people can have their own fun I don't really care.
1050$ I am very sad about the price. Like orders of magnitude sad.
I would have hoped for ~600$ with the economic realities maybe 800$, but 1000$+ just feels like too much doesn't valve have like a multi-billion dollar muscle couldn't the folks make it a tad cheaper...
I guess we can only blame the current market conditions at the end of the day.
While we have you here, I would like to say thanks, I disagree with you on a bunch of stuff in terms of opinions, perhaps less than I would expect, but I appreciate people who can and do put effort in supporting things they care about. And building things one can just feel good about, rather than always chasing a financial end game.
I am personally just tired of it, and it's brilliant to see someone thriving outside of the zone to working for work's sake. Even if the realities are different.
Hope we had more people like you whom we all could disagree with but mutually respect.
Cheers to a better future. (hope it wasn't too much waxing poetic but I feel like we are just too damn trapped in this tech bubble to value good moments these days)
so burn more tokens to save more tokens, so that we can spend more on X token but save on Y tokens?
not the question is which X tokens and which Y tokens? and since the output is non-deterministic how do you validate this?
LLMs aren't random and that enforces something that people are too dumb to realize that random-ness could be normally distributed but LLMs have no reason to be normally distributed or follow any sort of curve of understanding.
They are non-deterministic but with bias so their output might be just be worse with T' transformation for the class of problems A is solving but work great for B. or vice versa.
You can't reproducibly test LLMs and that allows all sorts of benchmarks to exist which can make any model look good or bad as much as we want. Enlightening stuff.
Not much different from sociological or psychological sciences where with enough bias in data you can prove anything.
When breaking the law is the norm, small or big seems to trivially get blurred, if you don't care when X law you didn't care about was being violated then why should the company think you will care if they broke Y law instead.
If you can't understand atleast 70% of those lines of Git, you really should be using a git web app or something.
Or maybe try jj. Either is better than you using Git's raw cli as someone who doesn't really know or care how it works.
Git cli is very much made with rough edges and is generally expected to be in hands of an advanced user, these days lots of commands have been made simpler and stuff, but git cli is just still very raw.
I am in market for a Car within a year or two, and I promise it won't be one from Volkswagen, if a company supports OSS platforms in cars and is available in APAC I will buy from them even if it costs 2x for the same specs (preferably a Hybrid but EV works too I guess).
Only reason deepseek is so cheap is because well I don't know, but actual pricing should be around their initial price which was 4x, at that price you have a healthy 25-50% margin based on occupancy, given the deepseek v4 is a very sparse moe model.
GLM 5.2 for example doesn't have more than 30-50% margins that's assuming old pricing for GPUs, current inflated GPU pricing well I am certain the margins must be lower. Ofc you can host for cheaper with quantization, and if you have very consistent capacity/utilization, which is not the norm with AI workloads.
Overall for large models like GPT 5.5 or Opus there must be healthier margins of around 50-70% assuming GPU pricing didn't increase for these companies. Even if it did 30-40% margin should be possible, even in worst case assuming all GPU they had saw a jump in pricing.
For smaller models it's hard to say, I would guess 20% but these models might be much smaller than I suspect, then it might be double that.
Note the issue is less intelligent tokens don't linearly scale down in memory usage, which is the biggest pain point of serving models. Context sizes have fucked us all.
Also anyone claiming OAI makes less margins on APIs or stuff might be wrong given they are on much lower context size, 1M context definitely is a lot more expensive to serve especially with smaller models like sonnet.
reply