More

sigbottle · 2026-06-17T12:49:23 1781700563

It's crazy how they're the ones with all the power and control at this point and they're still playing victim.

sigbottle · 2026-06-15T14:27:43 1781533663

The mafia game aspect is something I had not thought of. Have science fiction / dystopian novels focused on propaganda like that? Novels seem to have cartoonishly evil societies but the real world has stuff like this.

sigbottle · 2026-06-13T03:09:51 1781320191

That's annoying. I shelled out a pretty penny specifically to try out Fable, but if I'm only going to get to use it for 2 days...

sigbottle · 2026-06-10T15:43:01 1781106181

Just tried it. Fable is extremely strong. The fact that we can't point to any concrete architectural upgrade is worrying - that means "it just gets bigger" is kind of viable.

To be clear, the jump from Opus to Fable was like the jump from pre o3 -> o3 for me. Very sharp improvement, not incremental. But that could be explained by dummy long thinking times.

It one shot a task that Opus burned hundreds of dollars on to get nowhere. Very tricky semantic refactor, got it right. Granted, again, the semantics Opus and I fleshed out 3 months prior, but Opus couldn't execute on the vision. Fable could.

Then I discussed some philosophy and it was actually both pleasant (GPT constantly "corrected" you for the sake of correction without clarification, also still often just wrong; it's like it refused to think critically about philosphy) and accurate, and actually helped resolve some deep but subtle misconceptions I had around representationalism. When talking with GPT I felt like I was talking with someone who either was sycophantic or "anything that is not absolute truth is relativism" - Fable actually discussed.

Both is exciting and kind of makes me depressed. I can definitely see why people are getting hyped about AGI again. All the models were extremely strong technically but I felt like couldn't match the developer's tacit state - Fable definitely did, and that's a basic quailty to be considered "usefully intelligent" IMO, at least to me.

Shame that it's going away in 2 weeks and probably going to be nerfed if/when it's re-released.

keybored · 2026-06-10T23:44:17 1781135057

Worrying? Depressing? Why are people who are clearly enthusiasts (since they are testing the capabilities on release) always using these words? Is this a genuine interest, something that is pleasurable, or a morbid curiosity to test the bleeding edge of Humanity’s Doom? Bizarre.

sigbottle · 2026-06-11T11:50:58 1781178658

It would be amazing in a perfect and just world. This technology is revolutionary. I'm very interested in LLM's because I'm personally interested in how one thinks better and comes up with better ideas - I think LLM's might elucidate some structure on that.

But technological serfdom is waiting just around the corner. Well, to be fair, I think that societal forces would've pushed us to it anyways, no AI needed, but AI is a visceral, immediate, fast-moving instantiation of it.

keybored · 2026-06-11T13:29:25 1781184565

Telling and expected.

sigbottle · 2026-06-10T03:28:02 1781062082

Does anyone know what the architecture of Fable is? Is it harnesses? Did they solve persistent learning? What did they do?

sothatsit · 2026-06-10T06:21:43 1781072503

Seems to just be a bigger model.

moffkalast · 2026-06-10T14:37:28 1781102248

"Good ol' scaling, nothing beats that."

sigbottle · 2026-06-09T20:20:41 1781036441

I've only ever known about these through compilers, very cool.

On one project, through a variety of circumstances, dead code elimination was straight up not working, but we wanted to show the theoretical improvement of some approach - but we couldn't figure out why at the moment (we did spend a whole week chasing down the root cause after - maybe worth in hindsight...).

We were doing it by hand at one point, but someone suggested using CReduce for shrinking the code. Definitely was an interesting test-iterate loop...

sigbottle · 2026-06-09T18:01:28 1781028088

Codex IME is just smarter, I think it shows given both anecdotes but also how OpenAI has always been at the front of programming competitions and math problems.

But Claude models seem to be better at long term problems or more ambiguous problems.

I'm curious as to what the primary benefit here. Are there secret improvements in training? There hasn't been much in fundamental model architecture, I don't think. What about harnesses? I wonder what's pushing the AI. It seems like harnesses is the main thing pushing AI ever since CoT.

Spartan-S63 · 2026-06-09T18:19:03 1781029143

I find that OpenAI's agentic tools and models are better for building human-maintainable software. Meanwhile, Anthropic seems to be cosplaying Apple while missing out on all the exceptional engineering required to create something that polished. Their admission of predominately using Claude with little human oversight and their stealth mode is an indictment of a poor engineering culture, from what I can surmise.

someguyiguess · 2026-06-09T18:39:47 1781030387

Serious question: what is the secret to getting Codex to write decent code? I am on Windows. Maybe that is the issue, but I can't seem to get Codex to function anywhere near the level that I was previously able to get with even Claude Sonnet. Does Codex just not work well with Windows yet?

penetrarthur · 2026-06-09T19:41:31 1781034091

I got the codex to write near perfect code with somewhat strict agents.md and coding standards(a separate .md file referenced from agents.md). My .md files have examples and a long list of do's and don'ts I accumulated over the last 6 months or so, totaling 300-400 lines. I plan every feature with it until I am satisfied with the general approach it wants to take, and then it oneshots it in 95% of cases. The planning takes anywhere from 5 to 30 minutes. The actual execution has gotten stupidly fast, most of the times it is faster than making a cup of coffee.

acmecorps · 2026-06-09T20:08:20 1781035700

would you mind sharing your *.md files, for someone who is new at this?

fyrabanks · 2026-06-10T00:07:51 1781050071

"don't make any mistakes" /s

sroussey · 2026-06-09T20:19:54 1781036394

Have you tried using superpower skills?

someguyiguess · 2026-06-09T18:38:05 1781030285

I've had the exact opposite experience. For various reasons, I've had to move from Claude to Codex and the rate at which it burns tokens for the same output I would get from Claude is ridiculous. I'm probably burning tokens at a rate that is at least twice as much as I was when using Opus 4.5 for coding tasks and still finding that just manually coding is easier than trying to get Codex to write functional code.

greenavocado · 2026-06-09T18:27:02 1781029622

How smart a model is varies hour over hour, tracked over here: https://aistupidlevel.info/

sigbottle · 2026-06-07T14:45:59 1780843559

This is such an abstract principle that the principle itself cannot be refuted. The plan sounds fine on paper. "Just iterate bro". But it entirely depends on what rational agents you put into the system. Obviously, if I sub in a 5 year old child everywhere, this loop breaks. Humans and AI, sometimes one is better than the other at certain things, we're still learning.

The only way to test this is to test it out, in real life. Sometimes people see results, sometimes people don't. Note that yes, I am including the entire iteration process - even after iterating, people still don't see results with AI.

I have had both positive and negative experiences with AI, over multi-week projects. But apparently on hackernews, anything positive about AI is proof that AI is superhuman and taking over, and all follies about AI are lies by stupid humans who secretly have psychological dispositions to fear AI. Sometimes the AI genuinely isn't good enough. Are we not allowed to say that now? We might not know why, but it's just the truth.

The other solution is to formally analyze the entire space of possible actions the agent can take a priori. Then yes, you can definitively say whether or not the principle breaks or not. Can you, though? Can you give a formal specification for the space of possible actions for AI and show that your loop never breaks, or breaks less than humans, or any other sensible criteria? If not, then you can't just give an abstract principle and start making inferences from that.

sigbottle · 2026-05-29T17:15:02 1780074902

Well no, the idea is a tradeoff between interfaces and telemetry.

OK, the agents don't click in the same way as humans. You learn that, what about mouse hovering telemetry, time spent, etc. And one of the most extreme is to force biometrics - a lot of telemetry, breaks the interface a lot - but hey, you have assurance.

And none of these tradeoffs require understanding the deep processes of the human mind. Just, map is not the territory, how you do game the map harder and harder and how do the mapmakers respond to that?

catsrus · 2026-05-29T17:18:24 1780075104

did you look at the paper? they specifically look at mini tasks with cognitive processes (Eg what dictates the strategy of how people solve tasks)

CamperBob2 · 2026-05-29T17:43:12 1780076592

LLMs can solve original math problems at the IMO level and beyond, and you might be talking to one now. I don't think they are going to have problems with any CAPTCHA short of separate device attestation.

Whatever mechanism the paper proposes, rest assured it can be trained on.

sigbottle · 2026-05-26T16:11:00 1779811860

Sad times ahead.