Hacker Newsnew | past | comments | ask | show | jobs | submit | tempest_'s commentslogin

The other thing is that now a days you scale way way further vertically before you scale horizontally (assuming you are not using a cloud provider)

Everyone is hung up making their shit "scalable" like its a systems design interview at google in 2010.

Now a days you get a box with 600+ cores and 4TB of RAM. That is going to cover a very very large percentage of most enterprises.


I tried gemini-cli.

While the model was "ok" everything else was trash.

Constant 429s or 502s for "reasons".

10 different ways to try and pay for the stupid thing and none of them clear.

My favourite was as a paying customer I could not get it to use the latest model. Sometimes it would but most times it would dump me to 2.5.

All of my experience is exactly the opposite of the gp comment is saying.

The gemini-cli repo is gong show too https://github.com/google-gemini/gemini-cli


If you don't pay for it, you don't get much in the way of quota.

Earlier on (okay, until recently), Gemini CLI's quota management didn't work very well.

Antigravity tends to have better quota management behavior.


That is what was infuriating.

It was paid for through code assist enterprise and had all the flags enabled for the "preview" models. Still the only way to get gemini 3+ was to open and close the application 5 to 10 times and sometimes you would get 3 for a bit and then get dumped back to 2.5 and no matter what you do it would not use 3.

I tossed it after spending like 3 hours total messing around the google cloud console and trying a bunch of shit from the github issues. The other offerings don't waste my time (or waste less of it anyway). If they want me to beta test their shit they shouldn't charge for it.


Same way all big companies do, start changing what "search" means and including things under "search" that you might not think belongs there.

This is already how email works in the corporate world.

A writes email with chatgpt to B.

B sees big blob of text and summarizes email with chatgpt.

Adding an LLM in the middle is just the next step.


It's like one of those memes about the worst possible date picker, except for a communication system.

I would like one for the vram but I am sure they will be unobtainable after the initial stock sells out as I assume they were produced before the RAM prices went up.

They do say that hearing loss in old age can speed degradation, or maybe it is just correlated.

It definitely speeds the effects of dementia and similar because your brain insists on filling in what you didn’t hear and it tends to be wildly negative, at least in my two experiences of having gone through it.

> Mental atrophy due to less learning/thinking, isolation, loss of meaning and purpose happens first.

Except early onset Alzheimers happens and it also happens to plenty of people for which none of those are true.


I mentioned this could be a possible falsification of the idea. It's also possible there are multiple causes and the modality I mentioned is a cause for some. I'm not sure. There are definitely cases where isolation contributes to cognitive decline.

Exactly. My mom lost her job because of early onset. She was very social, read tons of books, etc…. Now, I’m happy she at least still knows who am, but she can’t put a sentence together.

Example: Claude Shannon

I feel like big / old companies thrive on process and are bogged down in bureaucracy.

Sure there is a process to get a library approved, and that abstraction makes you feel better but for the guy who's job it is to approve they are not going to spend an entire day reviewing a lib. The abstraction hides what is essentially a "LGTM" its just that takes a week for someone to check it off their outlook todos.

Maybe your experience is different.


I use CC, and I understand what caching means.

I have no idea how that works with a LLM implementation nor do I actually know what they are caching in this context.


They are caching internal LLM state, which is in the 10s of GB for each session. It's called a KV cache (because the internal state that is cached are the K and V matrices) and it is fundamental to how LLM inference works; it's not some Anthropic-specific design decision. See my other comment for more detail and a reference.

CC can explain it clearly, which how I learned about how the inference stack works.

A lot of people are provided their access through work.

They don't actually pay the bill or see it.


Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: