Hacker Newsnew | past | comments | ask | show | jobs | submit | p_stuart82's commentslogin

tbh ~1-3% PPL hit from Q4_K_M stopped being the bottleneck a while ago. the bottleneck is the 48 hours of guessing llama.cpp flags and chat template bugs before the ecosystem catches up. you are doing unpaid QA.

Just wait a week for model bugs to be worked out. This is well-known advice and a common practice within r/localllama. The flags are not hard at all if you're using llama.cpp regularly. If you're new to the ecosystem, that's closer to a one-time effort with irregular updates than it is to something you have to re-learn for every model.

for software engineering? not because of the typing.

the signal is every time a human has to grab the wheel. that's a label for what the agent still misses.


exactly people paid the premium so somebody else's OAuth screwup wouldn't become their Sunday. and here we are.

IMO nobody was paying for magic compute. they're paying to not touch ten years of glue.

if agents eat that glue, the moat gets thin fast.


> agents eat that glue

No wonder they hallucinate :)


It's when they sniff the glue, then things get wild.

It's been like 90% glue since perl took over.


Don’t forget staples and the tape too. LLMs have a weakness for paperclips, hope we don’t end up on that path

There's an "Operation Paperclip" joke hidden in here somewhere.

Yeah, at the last job there was a single outdated external wiki server left sitting in DO for those kinds of reasons while everything updated and internal had moved already (if not twice). If it hadn't become such a security risk it would never have been moved.

The problem is a lot of this glue is proprietary by design at the various cloud services. I realize there are open source and alternative abstractions for a lot of of the same services, but there’s still quite a bit of glue if you’re on AWS, for example, and looking to move to bare metal.

But maybe I’m just thinking of the current capabilities of agents, and if we fast forward a couple years, even removing these abstractions or migrating will be very low friction.


But you can run most of the glue on your own dedicated instances.

I run k8s on a bunch of dedicated servers that are super cheap and I have all bells and whistles - just tell your coding agent to do it. You can literally design the thing you would never do yourself and it works brilliantly.

Postgres running on dedicated hardware replicated and with wal backups - easy just tell codebuff (my harness of choice) to do it. Then any number of firewalls, load balancers, bastion servers, etc. if you can imagine it , codebuff will implement it.


IMO it doesn't flatten design into one thing. it splits it. cheap obvious work at scale, and a way smaller premium tier for real authorship. the middle is what actually gets crushed.

this is true of AI in general

the awkward part isn't just about reading sensitive files.

search, listings, direct reads, browser and computer use all sit behind different boundaries.

hard to tell what any given approval actually buys or exposes.


caveman stops being a style tool and starts being self-defense. once prompt comes in up to 1.35x fatter, they've basically moved visibility and control entirely into their black box.

yeah they took "i pick the budget" and turned it into "trust us".

I keep saying even if there's not current malfeasance, the incentives being set up where the model ultimately determines the token use which determines the model provider's revenue will absolutely overcome any safeguards or good intentions given long enough.

This might be true, but right now everybody is like "please let me spend more by making you think longer." The datacenter incentives from Anthropic this month are "please don't melt our GPUs anymore" though.

separating codebase and leaving 'cal.diy' for hobbyists is pretty much the classic open-core path. the community phase is over and they need to protect their enterprise revenue.

blaming AI scanners is just really convenient PR cover for a normal license change.


yeah the desktop app forgets it's the desktop app. claude code feels local right up until the api starts coughing up 500s. same thing, just in a terminal instead of a window.

Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: