Hacker Newsnew | past | comments | ask | show | jobs | submit | bisonbear's submissionslogin
1.I benchmarked Opus 4.8 vs. GPT 5.5 on 2 open source repos (stet.sh)
3 points by bisonbear 10 days ago | past | discuss
2.I used autoresearch to improve my AGENTS.md, measured against real tasks (stet.sh)
8 points by bisonbear 16 days ago | past | 7 comments
3.A brief investigation into the GPT-5.5 regression claims (stet.sh)
1 point by bisonbear 24 days ago | past
4.The Opus 4.7 reasoning curve - Medium is the best default? (stet.sh)
1 point by bisonbear 31 days ago | past
5.GPT-5.5 low vs. medium vs. high vs. xhigh: the reasoning curve on 26 real tasks (stet.sh)
2 points by bisonbear 36 days ago | past
6.GPT-5.5 vs. GPT-5.4 vs. Opus 4.7 on 56 real coding tasks from 2 open source repo (stet.sh)
4 points by bisonbear 43 days ago | past
7.I ran Opus 4.7 vs. Old Opus 4.6 vs. New Opus 4.6 on 28 Zod tasks (stet.sh)
2 points by bisonbear 56 days ago | past
8.Coding evals are broken. CI is green while AI code quality goes unmeasured (stet.sh)
1 point by bisonbear 59 days ago | past
9.Agents.md is the highest-leverage code you're not testing (stet.sh)
1 point by bisonbear 64 days ago | past
10.Your AI coding benchmark is hiding a 2x quality gap (stet.sh)
3 points by bisonbear 3 months ago | past
11.Things I Learned at the Claude Code NYC Meetup (benr.build)
2 points by bisonbear 4 months ago | past
12.Claude vs. Codex in the Messy Middle (benr.build)
1 point by bisonbear 5 months ago | past
13.Spacetime as a Neural Network (benr.build)
11 points by bisonbear 5 months ago | past | 5 comments
14.One agent isn't enough (benr.build)
18 points by bisonbear 6 months ago | past | 2 comments
15.Context Engineering: The New Skill for Working with AI Agents (benr.build)
1 point by bisonbear 7 months ago | past
16.The New Math of Building with AI (benr.build)
2 points by bisonbear 7 months ago | past

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: