bisonbear's submissions

1.		I benchmarked Opus 4.8 vs. GPT 5.5 on 2 open source repos (stet.sh)
		3 points by bisonbear 10 days ago \| past \| discuss
2.		I used autoresearch to improve my AGENTS.md, measured against real tasks (stet.sh)
		8 points by bisonbear 16 days ago \| past \| 7 comments
3.		A brief investigation into the GPT-5.5 regression claims (stet.sh)
		1 point by bisonbear 24 days ago \| past
4.		The Opus 4.7 reasoning curve - Medium is the best default? (stet.sh)
		1 point by bisonbear 31 days ago \| past
5.		GPT-5.5 low vs. medium vs. high vs. xhigh: the reasoning curve on 26 real tasks (stet.sh)
		2 points by bisonbear 36 days ago \| past
6.		GPT-5.5 vs. GPT-5.4 vs. Opus 4.7 on 56 real coding tasks from 2 open source repo (stet.sh)
		4 points by bisonbear 43 days ago \| past
7.		I ran Opus 4.7 vs. Old Opus 4.6 vs. New Opus 4.6 on 28 Zod tasks (stet.sh)
		2 points by bisonbear 56 days ago \| past
8.		Coding evals are broken. CI is green while AI code quality goes unmeasured (stet.sh)
		1 point by bisonbear 59 days ago \| past
9.		Agents.md is the highest-leverage code you're not testing (stet.sh)
		1 point by bisonbear 64 days ago \| past
10.		Your AI coding benchmark is hiding a 2x quality gap (stet.sh)
		3 points by bisonbear 3 months ago \| past
11.		Things I Learned at the Claude Code NYC Meetup (benr.build)
		2 points by bisonbear 4 months ago \| past
12.		Claude vs. Codex in the Messy Middle (benr.build)
		1 point by bisonbear 5 months ago \| past
13.		Spacetime as a Neural Network (benr.build)
		11 points by bisonbear 5 months ago \| past \| 5 comments
14.		One agent isn't enough (benr.build)
		18 points by bisonbear 6 months ago \| past \| 2 comments
15.		Context Engineering: The New Skill for Working with AI Agents (benr.build)
		1 point by bisonbear 7 months ago \| past
16.		The New Math of Building with AI (benr.build)
		2 points by bisonbear 7 months ago \| past