More

numeri · 2026-04-06T23:36:53 1775518613

11/20 for qwen/qwen3.5-flash-02-23 in Claude Code, with effort set to low.

numeri · 2026-03-09T13:53:22 1773064402

No, that's what the headline implies, and the body of the article doesn't support at all. It's (currently, and with no indication of intent to change this) two separate branches of their business.

numeri · 2026-02-25T15:24:01 1772033041

but Taalas had to quantize Llama 3.1 8B to death to get it to fit. It can't produce coherent non-English text at all.

numeri · 2026-02-16T17:08:20 1771261700

and if I was to guess, the latest generation of models (Claude Opus 4.6, GPT-5.3-codex, etc.) differ from Opus 4.5, GPT 5.2 primarily in the addition of deeper, more difficult (most likely agentic and coding-based, like Terminal Bench) tasks to their RLVR training.

I could be completely off, as my intuition here is fully based on public research papers, but it seems to explain the current state of things fairly well.

numeri · 2026-02-01T23:10:55 1769987455

No, Python or units[1] is always a better choice if I'm near a computer (and I nearly always am these days, unfortunately, I suppose). I do have three wonderful slide rules, though.

[1]: https://www.gnu.org/software/units/

numeri · 2026-02-01T23:00:13 1769986813

Introducing a solid zero-knowledge age verification option is the opposite direction of ending anonymity in the Internet, which other parts of the same governments are also working on.

So yeah, I'll gladly trust and cheer on the part working in the right direction.

numeri · 2025-10-11T00:15:40 1760141740

I'll just throw in support for gaming on Linux – it's pretty nice feeling these days! I still have the occasional (once every 5–8 months?) update cause a short-lived bug, but it's a very justifiable trade-off to avoid Windows these days.

numeri · 2025-10-07T15:27:18 1759850838

This is written by someone who's not an AI researcher, working with tiny models on toy datasets. It's at the level of a motivated undergraduate student in their first NLP course, but not much more.

justlikereddit · 2025-10-08T05:32:45 1759901565

If one can easily reach parity with a motivated undergrad by leveraging LLMs I will still consider it impressive.

While the 5-minutes model will never be useful in itself it lays the groundwork for amateurs and small groups to getting into developing small models. There's at the moment another HN headline hyping up a tiny model that scores impressively at the arc-agi benchmarks so it's clearly not a dead end to explore what is "household-affordable" models.

Though an approach that doesn't lean on the authors $200/month OAI sub would've been more interesting to follow.

loveparade · 2025-10-08T07:47:10 1759909630

You can also reach research parity by downloading a Github repository. Is that impressive too?

justlikereddit · 2025-10-08T10:39:52 1759919992

Downloading a file is not equivalent to having high level abstractified control over running software.

And if it is then I'm a farmer because I bought potatoes from the store.

numeri · 2025-09-25T10:09:08 1758794948

One sign would be occasionally changing course in response to overwhelming employee feedback. If that never or almost never happens, the feedback is being ignored, not taken constructively and not followed.

numeri · 2025-09-07T00:09:32 1757203772

This isn't right – calibration (informally, the degree to which certainty in the model's logits correlates with its chance of getting an answer correct) is well studied in LLMs of all sizes. LLMs are not (generally) well calibrated.