More

chermi · 2026-06-11T18:58:03 1781204283

Most mathematicians don't take pride in their results having no applications. That's just not true. Maybe some quirky pure logicians or something. But otherwise 90%+* of mathematicians I know would be at least satisfied if not thrilled for their work to be used by others.

*Completely made up statistic.

chermi · 2026-06-10T17:42:17 1781113337

This is why he needs a down vote button

chermi · 2026-05-26T16:53:48 1779814428

Wouldn't that just accelerate collapse? How much do you trust the outputs of the llm to provide trustworthy and valuable new information? I mean I understand distillation works. But that's much more structured and thoughtful than my sessions at least.

jack_pp · 2026-05-26T17:03:00 1779814980

We can trust the feedback we give it based on the output it provides.

ambicapter · 2026-05-26T17:16:37 1779815797

What kind of feedback are you giving? What's the reward function?

jack_pp · 2026-05-26T19:05:52 1779822352

Right now, no feedback since I don't run this system but our workflows could change to accommodate it

rahen · 2026-05-26T17:05:45 1779815145

I was thinking of curated replay buffers, which would act like "dreams". To prevent collapse, the offline dataset would mix the new mid-term data with a baseline of anchor data (the original training distribution) so the model doesn't drift.

Also, we wouldn't train on the whole session. A separate critic module, like a reward model, would filter the KV cache to extract the high-value information, like a garbage collector before the LoRA.

That's just an idea though. Right now most research focuses on changing the architecture itself (TITAN, HOPE...) instead.

chermi · 2026-05-22T17:15:49 1779470149

Ummm, why not both?

chermi · 2026-05-21T16:07:30 1779379650

I am very skeptical that musk is 10-20% interest. I would guess closer to 5.

chermi · 2026-05-19T18:11:09 1779214269

I've accidentally clicked ai mode probably 3+ times a week recently, so that's some real good metrics ;)

chermi · 2026-05-17T16:25:28 1779035128

If only there was a way to think beyond direct substitution.

chermi · 2026-05-17T16:19:03 1779034743

More predictive power is always a good goal, full stop. This is orthogonal to whether the model producing prediction helps with "understanding" directly. Predictability encodes understanding in a strict information theoretic sense, regardless of our ability as humans to access that understanding.

zigzag312 · 2026-05-17T17:54:03 1779040443

It's not arguing that predictive power is bad. Just that people often mistakenly believe some phenomenon is understood more deeply than it really is, because a model can fit data and generate accurate predictions.

ordu · 2026-05-17T17:47:17 1779040037

> More predictive power is always a good goal

But in some cases it is not good enough. If you look for a better explanation and chose gradient descent as your strategy, then you'll come to a local maximum eventually, but not for another explanation.

Arguably, it is hard to look for better explanation if the current one doesn't have a backtrack of failed predictions. One of the possible ways out of this situation is to search for the predictions that fail.

But what I want to say is explanations are not just for prediction. They are needed to build a mental model that then can drive the research. And new model can be built (theoretically) from the first principles. I can't find clean examples for it though. If we look at Einstein for example, he started with a failure to predict. But what he came up at first was Special Relativity which failed utterly with the gravity. Einstein spent like 10 years rewriting gravity to make it work with SR? Failed predictions of his new shiny theory didn't stop him, and it is considered to be good.

> Predictability encodes understanding in a strict information theoretic sense, regardless of our ability as humans to access that understanding.

But it doesn't necessary implies the possibility to move forward. I'm not sure if an analogy with compressed data is a good one, but you don't work with compressed data, you unpack it, and maybe unpack some more and convert to a very inefficient format with regard to the disk space used.

Compressed theory is good to apply it as is, but to refine it you should probably prefer something else.

chermi · 2026-05-17T16:04:37 1779033877

Per frontier token. You're not calculating the cost of a fixed quality asset here. Old hw running non-frontier models will be very valuable. In fact, we have two direct examples: older server gpus actually appreciating and the very obvious fact that not everyone always use MAX FULL EFFORT BEST MODEL no matter what.

vb-8448 · 2026-05-17T20:05:57 1779048357

Not consumer hw ... and if we speak about local llm we cannot assume most of us can put a rack in their basement

Already today is not possible to run deep seek v4 pro locally, and I cannot imagine that in 2 years we will be.

chermi · 2026-05-14T16:26:36 1778775996

What? Go volunteer at a botanical garden or something.