Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
LFM2.5-350M: No Size Left Behind (liquid.ai)
3 points by jbarrow 69 days ago | hide | past | favorite | 1 comment


Very cool to see a company pushing what's possible with (relatively) tiny models! A 350M parameter trained on 28T tokens that, from the benchmarks, is competitive with Qwen3.5-0.8B.

Comparing the architecture to Qwen3.5, it seems:

- fewer, wider layers

- mixing full attention and conv's, instead of the full+linear attention of Qwen3.5

- the vocab is about 1/4 the size




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: