The true test would be seeing the behavior change depending on the presence of r...

2099miles · on Feb 22, 2025

The words thinking and reasoning used here are imprecise. It’s just generating text like always. If the text is after “ai-thoughts:” then it’s “thinking” and if it’s after “ai-response” then it’s “responding” not “thinking” but it is always a big ole model choosing the most likely next token potentially with some random sampling

animal-husband · on Feb 22, 2025

That is what was observed - o1 family models performed the “cheat”, non-reasoning models didn’t.