More

rpdillon · 2026-04-16T03:09:06 1776308946

Excellent breakdown. The title seems misleading since we don't even know it's AI. I very much dislike when folks state speculation as fact.

> Amazon created an AI agent to look at every account and, instead of flagging them for any potential violations, had them canceled outright.

bredren · 2026-04-16T03:13:51 1776309231

“What cannot be known hollows the mind. Fill it not with guesswork.”

PaulHoule · 2026-04-16T03:28:07 1776310087

I can say I have seen a lot of cases where someone who was flagrantly guilty of abuse complained loudly that their account at some big tech company was unfairly canceled. I cannot say that's what is going on here, and I can also say I've seen plenty of cases where it was unfair and there was no due process.

rpdillon · 2026-04-15T04:16:41 1776226601

It's not Constitution-free, but CBP does claim a lot of extra rights within that zone. It covers about two-thirds of the U.S. population.

https://www.aclu.org/know-your-rights/border-zone

rpdillon · 2026-04-14T02:17:16 1776133036

It's like when you're playing a video game and you accidentally press the wrong button and shoot a guy in the face instead of healing him.

The idea that you could accidentally open a webpage instead of launching a program is a UX embarrassment.

rpdillon · 2026-04-13T12:28:03 1776083283

Right, but when humans are writing the code, they have learned to focus on putting downward pressure on the complexity of the system to help mitigate this effect. I don't get the sense that agents have gotten there yet.

pphysch · 2026-04-13T14:48:51 1776091731

Big business LLMs even have the opposite incentive, to churn as many tokens as possible.

jjk7 · 2026-04-13T20:25:54 1776111954

At least tokens are equivalent to measuring 'thinking'... I wouldn't mind if it burned 100k tokens to output a one line change to fix a bug.

The problem is maximizing code generated per token spent. This model of "efficiency" is fundamentally broken.

rpdillon · 2026-04-13T12:26:51 1776083211

Debugging would suffer as well, I assume. There's this old adage that if you write the cleverest code you can, you won't be clever enough to debug it.

There's nothing really stopping agents from writing the cleverest code they can. So my question is, when production goes down, who's debugging it? You don't have 10 days.

rpdillon · 2026-04-12T10:43:13 1775990593

Rather than lying, I think of it more as financial dead reckoning.

rpdillon · 2026-04-12T10:38:45 1775990325

I wonder what percentage of services run on the Internet exceed a few hundred transactions per second.

icedchai · 2026-04-12T11:47:49 1775994469

I’ve seen multimillion dollar “enterprise” projects get no where close to that. Of course, they all run on scalable, cloud native infrastructure costing at least a few grand a month.

not_kurt_godel · 2026-04-13T02:57:20 1776049040

> a few grand a month.

A negligible cost for a successful tech business that also works when your requirements exceed the capabilities of a single VPS.

icedchai · 2026-04-13T12:11:47 1776082307

I agree. But these are projects that barely get any requests.

egwor · 2026-04-12T10:45:31 1775990731

I think the better question to ask is what services peak at a few hundred transactions per second?

rpdillon · 2026-04-11T03:14:09 1775877249

This is a nice point that I haven't seen before. It's interesting to regress AI to the simplest form and see how we treat it as a test for the more complex cases.

rpdillon · 2026-04-10T11:48:29 1775821709

I trend libertarian because I have a strong anti-authoritarian streak. I used to think of myself as closer to the Republicans, but these days I mostly only agree with the Democrats. Weird times.

rpdillon · 2026-04-02T13:13:35 1775135615

Been running lemonade for some time on my Strix Halo box. It dispatches out to other backends that they include, like diffusion and llama. I actually don't like their combined server, and what I use instead is their llama CPP build for ROCm.

https://github.com/lemonade-sdk/llamacpp-rocm

But I'm not doing anything with images or audio. I get about 50 tokens a second with GPT OSS 120B. As others have pointed out, the NPU is used for low-powered, small models that are "always on", so it's not a huge win for the standard chatbot use case.

zozbot234 · 2026-04-02T13:24:39 1775136279

Even small NPUs can offload some compute from prefill which can be quite expensive with longer contexts. It's less clear whether they can help directly during decode; that depends on whether they can access memory with good throughput and do dequant+compute internally, like GPUs can. Apple Neural Engine only does INT8 or FP16 MADD ops, so that mostly doesn't help.