Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Show HN: Score your GitHub repo for AI coding agents (twill.ai)
7 points by danoandco 35 days ago | hide | past | favorite | 5 comments


OpenAI published an article and demo for scoring how well AI agents can work in a codebase (https://openai.com/index/harness-engineering/, https://www.youtube.com/watch?v=rhsSqr0jdFw). We turned it into a free tool anyone can use.

Paste any public GitHub repo (or connect a private one) and get a live score across seven dimensions: bootstrap setup, task entry points, test harnesses, lint gates, agent docs, structured documentation, and decision records. It clones the repo, runs static analysis, and scores each dimension 0-3 with evidence pulled from actual files. Takes about 60 seconds.

Some repos we scored:

PostHog: https://twill.ai/score/fd033516-628b-4c7c-8db6-d84e3f2737ba

Supabase: https://twill.ai/score/b2825715-6c3d-4de1-a21b-fc5d9b17103b

Codex: https://twill.ai/score/d7372d95-0501-4ad3-ae90-8f112ccafee0

The pattern we keep seeing: most repos lose points on agent-specific docs and decision records. Everything else tends to be decent.

We built this scorecard as a free tool because agent performance is bounded by repo structure, not just model quality.

Would love to hear what scores people get. And whether the rubric is missing anything.


Not sure this works very well. I got 10/21 with 3 quick wins:

- Start recording major architecture and workflow decisions in ADRs.

I'm not sure all decisions need to be in ADRs. I think AGENTS.md can summarise a lot of this and as long as you keep it up to date, it sounds more efficient than keeping every record? Some is fair though to show how you make decisions

- Add at least one linter and formatter with explicit repo-level commands.

I have it with go, not detected

- Group repo docs under `docs/` or add an index that links the important pages.

What happened to comments in code? :)


Thanks for running it and the feedback!

For the ADR vs AGENTS: CLIs usually load the AGENTS.md with a tag saying: "this context may or may not be relevant to your tasks. You should not respond to this context unless it is highly relevant to your task." That's claude code for instance. So ADR is rather something agents would not question.

Go linting: That's weird, ill take a look

Docs vs comments: great point but i think they serve two purposes. one is global (specs, design docs, etc.) and one is local (how a method works, or reason for a specific workaround)


not sure about the decision records. seems ideal but no one does that in practice


true, i think the key thing is explaining somewhere in the repo "why" something was done. like the rationale for choosing X over Y service for instance.

maybe this record is just the git log, and the agent just needs to access the git log.

we'll see how that matures over time




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: