Hacker Newsnew | past | comments | ask | show | jobs | submit | numeri's commentslogin

11/20 for qwen/qwen3.5-flash-02-23 in Claude Code, with effort set to low.

No, that's what the headline implies, and the body of the article doesn't support at all. It's (currently, and with no indication of intent to change this) two separate branches of their business.


but Taalas had to quantize Llama 3.1 8B to death to get it to fit. It can't produce coherent non-English text at all.


and if I was to guess, the latest generation of models (Claude Opus 4.6, GPT-5.3-codex, etc.) differ from Opus 4.5, GPT 5.2 primarily in the addition of deeper, more difficult (most likely agentic and coding-based, like Terminal Bench) tasks to their RLVR training.

I could be completely off, as my intuition here is fully based on public research papers, but it seems to explain the current state of things fairly well.


No, Python or units[1] is always a better choice if I'm near a computer (and I nearly always am these days, unfortunately, I suppose). I do have three wonderful slide rules, though.

[1]: https://www.gnu.org/software/units/


Introducing a solid zero-knowledge age verification option is the opposite direction of ending anonymity in the Internet, which other parts of the same governments are also working on.

So yeah, I'll gladly trust and cheer on the part working in the right direction.


I'll just throw in support for gaming on Linux – it's pretty nice feeling these days! I still have the occasional (once every 5–8 months?) update cause a short-lived bug, but it's a very justifiable trade-off to avoid Windows these days.


This is written by someone who's not an AI researcher, working with tiny models on toy datasets. It's at the level of a motivated undergraduate student in their first NLP course, but not much more.


If one can easily reach parity with a motivated undergrad by leveraging LLMs I will still consider it impressive.

While the 5-minutes model will never be useful in itself it lays the groundwork for amateurs and small groups to getting into developing small models. There's at the moment another HN headline hyping up a tiny model that scores impressively at the arc-agi benchmarks so it's clearly not a dead end to explore what is "household-affordable" models.

Though an approach that doesn't lean on the authors $200/month OAI sub would've been more interesting to follow.


You can also reach research parity by downloading a Github repository. Is that impressive too?


Downloading a file is not equivalent to having high level abstractified control over running software.

And if it is then I'm a farmer because I bought potatoes from the store.


One sign would be occasionally changing course in response to overwhelming employee feedback. If that never or almost never happens, the feedback is being ignored, not taken constructively and not followed.


This isn't right – calibration (informally, the degree to which certainty in the model's logits correlates with its chance of getting an answer correct) is well studied in LLMs of all sizes. LLMs are not (generally) well calibrated.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: