Hacker Newsnew | past | comments | ask | show | jobs | submit | amunozo's commentslogin

These people only care about American lives, and fake to care when China or any other country they don't like attack anybody.

Plenty of people around the world know about the authoritarian aspects of the US way better than the Americans, as they suffer their consequences.

Which ones do you like to mention?

Iran, Gaza, Cuba, Irak, Afghanistan, Yemen, Lebanon... These people do not only suffer their tyrannical governments, but they must suffer also the war actions of the US and its allies.

The fact that you just rattled off a list of terror states like it was nothing is so damn funny to me

You know that there are regular people living in these terror states that have to suffer not only their terror states but the US? It's not that I feel pity of the terror states, but of the regular people. It's a very easy distinction that for some reason (racism?) people is troubled to make.

Its two step system: tyrannical government committed war actions against US and allies, US and allies responded, people suffer.

Hyper presidentialist state that allows one administration (and realistically one person) to start a war against another nation without having authorization from congress.

This happened a few weeks ago, actually.


Regular US administrations that commited war crimes in half the world for decades. But apparently it only matters what they do in the US.

Why do we ignore all the human right abuses the US perform abroad? Iraq, Afghanistan, now Iran, Gaza and Lebanon through Israel, support to Saudi Arabia (which would not exist without the US), El Salvador... And inside it's also horrible with its treatment to immigrant.

That should be at least comparable (if not worse) than what China is doing.


Yes, El Salvador is so evil for imprisoning dangerous criminals and protecting innocent lives.

El Salvador is blessed by evil criminals put away from the streets. It took thousands of those who you defend for a whole country to be free to enjoy tranquility and security. I was born there and I know better than you calling us evil

I am not telling that imprisoning the criminals is a bad things, but the conditions in which this has been done and how they're treated in prison is against human rights by any measure.

The person you responded is agreeing with you.

Parent was being sarcastic

This is how china tried to justify its genocide against uighers. Was theboutrage against that just politically motivated? Or do americans only care about ethnic cleansing when theyre not the ones doing it

They also don't care when done by their allies.

Not for imprisoning, but for imprisoning them in draconian conditions, without proper judgements, etc. Have you seen those prisons for fuck sake?

Competition with the Soviet Union gave all the workers in the world better conditions, also advances in science and technology... (And risk of mutual destruction ;)), even if the USSR wasn't good.

For those who rely on open source models but don't want to stop using frontier models, how do you manage it? Do you pay any of the Chinese subscription plans? Do you pay the API directly? After GPT 5.5 release, however good it is, I am a bit tired of this price hiking and reduced quota every week. I am now unemployed and cannot afford more expensive plans for the moment.

I have $20 ChatGPT subscription. Stopped Anthropic $20 subscription since the limit ran out too fast. That's my frontier model(s).

For OSS model, I have z.ai yearly subscription during the promo. But it's a lot more expensive now. The model is good imo, and just need to find the right providers. There are a lot of alternatives now. Like I saw some good reviews regarding ollama cloud.


I am thinking about getting some 1 year promotion as a student before defending my PhD.

I've been on Kimi K2.5 on openrouter for a couple of months for anything I can't run locally. Really is dirt cheap for how good it is. Haven't assessed K2.6 yet but the price is higher so it needs to be more efficient, not just more capable.

But more broadly: openrouter solves the problem of making a broad range of models available with a single payment endpoint, so you can just switch around as much as you like.


How do you find the token speed of open router with kimi?

I have tasks that used to take ~3-5min with Sonnet 4.6. With OpenRouter Kimi, the same task takes 10+ min. It's also just obviously slower in opencode sessions. The results are good, and I love the lower cost, but the speed can be frustrating.


Have you considered... not subscribing? You can ask the top models via chats for specific stuff, and then set up some free CLI like mistral.

If you're trying to make a buck while unemployed, sure get a subscription. Otherwise learn how to work again without AI, just focus on the interesting stuff.


I just want to try to make something useful out of my time, that's why I'm subscribed to Codex at the moment. 20€ is affordable, not really a problem. But yes, maybe I would do me a favor unsubscribing and going back to the old ways to learn properly.

I'm "working" on some open source stuff with minimal AI. But I will probably cave in at some point and get a subscription again, the moment I need to spin up a mountain of garbage, fast.

At home I currently use MiniMax via OpenRouter - it’s pretty good and very cheap. They have a subscription plan, but I’m not ready to commit to it yet.

Another way to keep the ability to try out new models is to buy a reseller subscription like Cursor’s.


I tried OpenRouter but I feel the money flies even with these models, it is not comparable to a subscription but yes, it's very good for trying. Maybe I should test other models alongside GPT 5.5 to see which one fits me.

I'm also unemployed. So far the models that I've used the most are Kimi and GLM. I haven't done that much agentic coding though, I've mostly used them for studying math and general conversations and I'm generally happy with their performance.

Gemini has a free tier for API but yeah just use chat.

For DeepSeek you can use their API and if you ran it constantly you'd still be under what OpenAI or Anthropic charge for a coding plan.

I had Claude make me a quick tool to combine my Claude Code token usage (via ccusage util) with OpenRouter pricing from the models API

I'm on Max x5 plan and any of the 'good' models like Kimi 2.6, GLM, DeepSeek would have cost 3-5x in per-token billing for what I used on my Claude plan the last three months

So unless my Claude fudged the maths to make itself look better, seems like I'm getting a good deal


I am not so sure, credits fly when using any model trough API if I use it as much as I use Codex.

I was thinking the same. How can it be than other providers can offer third-party open source models with roughly the similar quality like this, Kimi K2.6 or GLM 5.1 for 10 times less the price? How can it be that GPT 5.5 is suddenly twice the price as GPT 5.4 while being faster? I don't believe that it's a bigger, more expensive model to run, it's just they're starting to raise up the prices because they can and their product is good (which is honest as long as they're transparent with it). Honestly the movement about subscription costing the company 20 times more than we're paying is just a PR movement to justify the price hike.

I'm pretty sure OpenAI and Anthropic are overpricing their token billed API usage mainly as an incentive to commit to get their subscriptions instead.

Anthropic recently dropped all inclusive use from new enterprise subscriptions, your seat sub gets you a seat with no usage. All usage is then charged at API rates. It’s like a worst of both worlds!

What's the point then? Special conditions for data retention/non-training policies?

SSO Tax is a large part of it, controls around plug-in marketplace, enforcement of config, observeability of spend. But it’s all pretty weak really for $20 a month.

And Microsoft are going the same route to moving Copilot Cowork over to a utilisation based billing model which is very unusual for their per seat products (I’m actually not sure I can ever remember that happening).


The target audience for the APIs is third party apps which are not compatible with the subscriptions.

True. I missed that.

More Protestant than Christian.

I totally agree. The feeling you get by running these things locally is different, as if you could feel the magic closer.

A bit skeptical about a 27B model comparable to opus...

For at least a year now, it has been clear that data quality and fine-tuning are the main sources of improvement for mediym-level models. Size != quality for specialized, narrow use cases such as coding.

It’s not a surprise that models are leapfrogging each other when the engineers are able to incorporate better code examples and reasoning traces, which in turn bring higher quality outputs.


If all you're looking at is benchmarks that might be true, but those are way too easy to game. Try using this model alongside Opus for some work in Rust/C++ and it'll be night and day. You really can't compare a model that's got trillions of parameters to a 27B one.

> ...and it'll be night and day.

That's just, like, your opinion, man.

> You really can't compare a model that's got trillions of parameters to a 27B one.

Parameter count doesn't matter much when coding. You don't need in-depth general knowledge or multilingual support in a coding model.


I often do need in-depth general knowledge in my coding model so that I don't have to explain domain specific logic to it every time and so that it can have some sense of good UX.

You should try it out. I'm incredibly impressed with Qwen 3.5 27B for systems programming work. I use Opus and Sonnet at work and Qwen 3.x at home for fun and barely notice a difference given that systems programming work needs careful guidance for any model currently. I don't try to one shot landing pages or whatever.

Is it available for API use? I don't have a laptop capable of running it.

Are you using the same agent/harness/whatever for both Claude and Qwen, or something different for each one?

I use Pi at home and Claude Code at work (no choice). I use bone stock Pi. No extensions.

From what I understand, ~30b is enough "intelligence" to make coding/reasoning etc. work, in general. Above ~30b, it's less about intelligence, and more about memorization. Larger models fail less and one-shot more often because they can memorize more APIs (documentation, examples, etc). Also from my experience, if a task is ambiguous, Sonnet has a better "intuition" of what my intent is. Probably also because of memorization, it has "access" to more repositories in its compressed knowledge to infer my intent more accurately.

Some of these benchmarks are supposedly easy to game. Which ones should we pay attention to?

SWE-REbench should not be gameable. They collect new issues from live repos, and if you check 1-2 months after a model was released, you can get an idea. But even that would be "benchmaxxxable", which is an overloaded term that can mean many things, but the most vanilla interpretation is that with RL you can get a model to follow a certain task pretty well, but it'll get "stuck" on that task type, or "stubborn" when asked similar but sufficiently different tasks. So for swe-rebench that would be "it fixes bugs in these types of repos, under this harness, but ask it to do soemthing else in a repo and you might not get the same results". In a nutshell.

well, your own, unleaked ones, representing your real workloads.

if you can't afford to do that, look at a lot of them, eg. on artificialanalysis.com they merge multiple benchmarks across weighted categories and build an Intelligence Score, Coding Score and Agentic score.


ARC-AGI 2

GLM 5 scores 5% on the semi-private set, compared to SOTA models which hover around 80%.


None. Try them out with your own typical tasks to see the performance.

You should be skeptical. Benchmark racing is the current meta game in open weight LLMs.

Every release is accompanied by claims of being as good as Sonnet or Opus, but when I try them (even hosted full weights) they’re far from it.

Impressive for the size, though!


A small model can be made to be "comparable to Opus" in some narrow domains, and that's what they've done here.

But when actually employed to write code they will fall over when they leave that specific domain.

Basically they might have skill but lack wisdom. Certainly at this size they will lack anywhere close to the same contextual knowledge.

Still these things could be useful in the context of more specialized tooling, or in a harness that heavily prompts in the right direction, or as a subagent for a "wiser" larger model that directs all the planning and reviews results.


you'd be surprised how good small models have gotten. Size of the model isnt all that matters.

My experience with qwen-3.6:35B-A3B reinforces this, gonna give this a spin when unsloth has quants available

Gemini flash was just as good as pro for most tasks with good prompts, tools, and context. Gemma 4 was nearly as good as flash and Qwen 3.6 appears to be even better.


> when unsloth has quants available

https://huggingface.co/unsloth/Qwen3.6-27B-GGUF


That was quick (compared to the 1T Kimi-2.6, not surprising)

Haha :) We had some issues with Kimi-2.6 since it was int4 and we were investigating how to handle it :)

Appreciate what y'all do! We were slacking about how many HGX-B300 it would take to run Kimi and it looks like we could actually fit 2-3 Kimis on a single HGX.

> Size of the model isnt all that matters.

What matters is the motion in the tokens


Plus you can control thinking time a lot more, so when Anthropic lobotomizes Opus on you...

Opus 4.5 mind you, but I’m not too surprised given how good 3.5 was and how good the qwopus fine tune was. The model was shown to benefit heavily from further RL.

Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: