Hacker Newsnew | past | comments | ask | show | jobs | submit | elpakal's commentslogin

At a kill s@@s hackathon at work, I was able to build something that

uses a node image installs claude code runs a /review-like command puts inline comments to PR deletes old comments when rerunning

OCR seems cool, but overkill, and I'm definitely not using Code Rabbit after their CEO was on here acting snobbish a while back.

Point being AI code review in Git** itself isn't hard to do and can add a lot of value quickly.


Nothing against coderabbit or SaaS specifically, but this was one of the reasons I stopped using it https://kudelskisecurity.com/research/how-we-exploited-coder...

It's very easy to build a basic code review tool. It's hard to build one that developers won't ask you to turn off because of false positives (or one that will miss your next escaped bug)

I think if all the tool does is run a claude code level /review skill (which all developers should definitely run before they even open a PR) then isn't this a bit of a review theater? Just a guardrail to those developers who don't run a /review-triage-fix skill in /loop before they take the PR out of draft?

I wonder how many PRs in the world got to production where several developers commented on each other's code, and none of them read anything, just used their gh cli / MCP to post / answer comments / fix issues on their behalf.

There is going to be an exponential growth of code generated, and you can't escape AI code review, but also there is no real difference between having Claude Code write the code and review itself locally, vs communicating with itself via a slow and downtime prone medium of "PR comments"

tl;dr - without any human in the loop reviewing the AI code review, or skimming to see what the AI code review missed, there is no real reason to use a "code review" you can just run it as part of the CI/CD and hope AI won't miss anything (according to my linkedin feed, there are people out there who really thing this way...)


Yes! Where it gets really interesting is the scenario in which every developer has their own unique review skill/workflow, so the reviews end up being different than you running it yourself, but nobody is reading them still.

I think that in most cases you either agree on a PR comment or you don't. But it has to leave a mark in PR. This is how we do reviews, ignoring PR comment is one of the worst offenses one can make. I don't let it go.

How snobbish was the CEO acting?

even worse on mobile

Now can LinkedIn please label posts



Apple's Foundation models seem great on paper until you see the 4k context window. (though I know we are still early in this chapter).


dotIPA, an iOS app build size inspector that runs locally on your macOS [$4.99]

Track app size growth over time, inspect contents, spot duplication and size bloat and more.

https://apps.apple.com/us/app/dotipa/id6742254881


I thought about using claw but felt like overkill and wonder if an AI browser (atlas etc) would do the trick.


For sure it was overkill/not the most efficient approach - really I was more just curious if it would work. The answer was "kind of", but even that is pretty amazing. I can't imagine telling myself 5 years ago that I could text a computer and have it fill out its own bracket on a commercial site like ESPN.


its going to be cool when you put in your todo list in the morning that you need to fill out your espn bracket and by lunch your agent will have 3 different versions ready for your review


Really cool idea. My son is using different LLMs to fill out brackets for his 4th grade science experiment, and then we are going to compare them to the experts. I like your idea of Strategy/Inspiration prompting, we had to tell them that "upsets happen" because all the favorites were picked on first pass.

Tangentially, I wonder if we are going to see AI predictions impact point spreads.


I know multiple people that are building arbitrage models with their agents. i bet it makes the markets pretty efficient


> This is a known limitation with small LLMs (0.6B-1.2B) doing tool calling.

To me this is this nut to crack, wrt tool calling and locally running inference. This seems like a really cool project and I'm going to dive around a little later but if it's hallucinating for something as basic as this makes me think it's more of POC stage right now (to echo other sentiment here).


That's a fair read. Tool calling reliability with sub-4B models is genuinely the hardest unsolved problem in on-device AI right now.

The inference engine (MetalRT) is production-grade, the pipeline architecture is solid, but the models at this size are still the weak link for complex tool routing. Larger model support (where tool calling is much more reliable) is next on the roadmap. Please stay tuned!


Sorry, I scrolled through some of the rest of the comments on this thread and can’t stay tuned.


> file-based state that persists between agent invocations

Can you expand on this with a practical example?


It needs a canonical source of truth, something isolated agents can't provide easily. There are tools out there like specularis that help you do that and keep specs in sync.


One example: I let the agent culminate the essence of all previous discussions into a spec.md file, check it for completeness, and remove all previous context before continuing.


[flagged]


thanks


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: