Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The true test would be seeing the behavior change depending on the presence of reasoning


The words thinking and reasoning used here are imprecise. It’s just generating text like always. If the text is after “ai-thoughts:” then it’s “thinking” and if it’s after “ai-response” then it’s “responding” not “thinking” but it is always a big ole model choosing the most likely next token potentially with some random sampling


That is what was observed - o1 family models performed the “cheat”, non-reasoning models didn’t.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: