Well, that sucks. Replacing the harness with something task-specific has proven ...

gck1 · 2026-04-09T16:22:57 1775751777

> A trivial example is that you can improve performance in a very simple way: ask "are you sure?" showing the model what it intends to do, BEFORE doing it. Improves performance by 10%

Put it into the "are you sure" loop and you'll see the model will just keep oscillating for eternity. If you ask the model to question the output, it will take it as an instruction that it must, even if the output is correct.

spwa4 · 2026-04-09T21:35:33 1775770533

Not in my experience. I mean, it happens. But models can check if their own function calls are reasonable. And that doesn't require dropping the context cache, so it's a lot less expensive than you would probably initially think.