More

flockonus · 2026-06-16T18:34:01 1781634841

> not so interesting to compare to

Absolutely disagree here, something that is considered good practice is very interesting to compare to!

eska · 2026-06-16T20:09:42 1781640582

I mean that mostly in the sense that there is huge variance in idiomatic code. So your optimized C/Rust code might be 100-1000x faster than two idiomatic versions of writing that code

flockonus · 2026-05-20T20:11:23 1779307883

For coding tasks 27B is reported to be much more effective, altho you can probably only run 4b or 5b quants @ this memory.

Recommend https://www.reddit.com/r/LocalLLaMA/ as a great source for this type of discussion.

milch · 2026-05-21T16:19:03 1779380343

I played around with local LLMs on my M4 Max 64GB this weekend and this is exactly what I found. I put Opus 4.7 "head to head" on the same task as Qwen 3.6 and a few other local models. The 35B did not perform well IME - it needed a lot of handholding and even then the final result did not work until a few more tweaks, while Claude one shot the task. The 27B was much better and also one shot the task, but took about ~55min as opposed to about ~15min for Claude. The 27B is probably something that I could happily run for many use cases if I had some faster hardware... the main problem there seems to be that at larger context sizes, prompt decoding can take several minutes.

gcr · 2026-05-23T14:13:57 1779545637

This matches my experience too. The little a3b model is quite capable for its size class, as is the 27B model, but it’s still an order of magnitude less effective than Claude on the “effectiveness / time” curve

flockonus · 2026-05-20T20:06:16 1779307576

Curious about the other way around, how many tokens per second a productive developer codes in a day?

sedatk · 2026-05-20T20:18:00 1779308280

Much more, given that you need to incorporate the dev's thought process too.

flockonus · 2026-05-20T22:12:35 1779315155

Interesting point, i didn't consider the thought process as tokens.

amelius · 2026-05-20T22:22:37 1779315757

I'm not sure if it's much more though.

flockonus · 2026-05-21T18:29:25 1779388165

Oh while coding i sure think far more "tokens" than i output to the editor, at least 4->1.

Now, if those should be counted in the process or only the output is a harder question.

flockonus · 2026-05-01T22:47:36 1777675656

Curious to see if to what intensity the Moon will "ring like a bell" at this one.

ref: https://books.google.ie/books?id=6QAAAAAAMBAJ&pg=PA56&lpg=PA...

flockonus · 2026-04-28T00:50:11 1777337411

How this will age:

>I do not and will not use the internet, in any form, for any purpose.

andyfilms1 · 2026-04-28T01:31:43 1777339903

Oh no! Anyway

sambapa · 2026-04-28T02:58:07 1777345087

I mean... Right now it sounds pretty good?

flockonus · 2026-04-27T19:15:45 1777317345

> How to check if your voice is being misused

I love that the answer here is basically.. - you don't -

But maybe mitigate at unreasonable personal costs.

How about services simply stop taking public information as proof of identity?

flockonus · 2026-04-24T23:10:10 1777072210

That may be true for OpenAI, less so for Antropic - which has much better margins. Both of these companies CEOs have come in public saying the same.

No doubt as of currently Google has a better business. But the same argument could have been said about Instagram or Whatsapp before Facebook (now Meta) acquired them.

flockonus · 2026-04-22T23:29:12 1776900552

The bigger the [dense] models the more inference tends to take, it seems pretty linear.

In that sense, how long you'd need to wait to get say ~20tk/s .. maybe never.

(save a significant firmware update / translation layer)

u8080 · 2026-04-23T10:48:30 1776941310

For 1T Q4 - 1 token generated per every ~500GB memory read. So you'll need something like ~10TB/s memory for 20t/s. This is 8x5090 speed area and 16x5090 size area. HBM4 will bring us close to something really possible in home lab, but it will cost fortune for early adopters.

Speculative decoding/DFlash will help with it, but YMMV.

Edit: Missed a part that this is A32B MoE, which means it drastically reduces amount of reads needed. Seems 20 t/s should be doable with 1TB/s memory (like 3090)

flockonus · 2026-04-16T05:03:05 1776315785

While they do make this argument, realistically anyone sending their prompt/data to an external server should assume there will be some level of retention.

And more so in particular, anyone using Darkbloom with commercial intents should only really send non-sensitive data (no tokens, customer data, ...) I'd say only classification tasks, imagine generation, etc.

joelthelion · 2026-04-16T10:35:11 1776335711

There's a difference between trusting Anthropic and trusting random mac owners.

flockonus · 2026-04-17T16:45:46 1776444346

I know where my answer lies in that; but i don't claim to be an objective truth.

For example OpenAI has been caught sharing data with the gov. agencies.

flockonus · 2026-03-29T06:34:11 1774766051

My motivation was quite different, and i'd like to encourage more people to consider the same.

Often times narcissistic power grabbing (often technically incompetent) engineers become managers, like it was the case a previous team I've worked at and it was quite penalizing to the whole team.

I've realized that either i can be the one managing and try to do good, or be at the mercy of another manager; chose the first.

bob1029 · 2026-03-29T09:06:27 1774775187

This is what taught me to sublimate my own ego. Overcoming the wickedness of others with patient, meditative calm can be an incredible experience. It just takes longer than a business day to play out. You've gotta think across much grander time scales. 3 steps ahead, at minimum, at all times. Burn these people out of your team. Take charge and stay focused on the customer. It often takes non technical people a little bit longer to lock onto complex problems and downstream consequences. It's taken me nearly 2 years to deal with one bad hire. All I can fantasize about is being in a position to never hire that kind of person again. The destruction some people can cause in a business is unthinkable to those who haven't seen it yet. I didn't believe these people existed until it was way too late.

I still prefer to solve technology problems, but I see a bigger and more important mission out there. Keeping the team happy and aligned on the customer is much more rewarding overall. I'd rather 5% dev time in paradise than 95% dev time in hell.