Even Gemini with no memory does hilarious things. Like, if you ask it how heavy ...

drnick1 · 2026-04-17T01:33:57 1776389637

> how are you gonna trust something that can casually make such obvious mistakes?

In many cases, a human can review the content generated, and still save a huge amount of time. LLMs are incredibly good at generating contracts, random business emails, and doing pointless homework for students.

gf000 · 2026-04-17T05:49:31 1776404971

And humans are incredibly bad at "skimming through this long text to check for errors", so this is not a happy pairing.

As for the homework, there is obviously a huge category that is pointless. But it should not be that way, and the fundamental idea behind homework is sound and the only way something can be properly learnt is by doing exercises and thinking through it yourself.

nickjj · 2026-04-16T20:56:15 1776372975

Yeah, ChatGPT's paid version is wildly inaccurate on very important and very basic things. I never got onboard with AI to begin with but nowadays I don't even load it unless I'm really stuck on something programming related.

dyauspitr · 2026-04-16T19:29:05 1776367745

So what? That might happen one out of 100 times. Even if it’s 1 in 10 who cares? Math is verifiable. You’ve just saved yourself weeks or months of work.

icedchai · 2026-04-16T20:30:16 1776371416

You don't think these errors compound? Generated code has 100's of little decisions. Yes, it "usually" works.

russfink · 2026-04-16T23:28:54 1776382134

LLM’s: sometimes wrong but never in doubt.

dyauspitr · 2026-04-16T20:37:08 1776371828

Not in my experience. With a proper TDD framework it does better than most programmers at a company who anecdotally have a bug every 2-3 tasks.

tranceylc · 2026-04-17T00:10:39 1776384639

The kind of mistakes it makes are usually strange and inhuman though. Like getting hard parts correct while also getting something fundamental about the same problem wrong. And not in the “easy to miss or type wrong” way.

I wish I had an example for you saved, but happens to me pretty frequently. Not only that but it also usually does testing incorrectly at a fundamental level, or builds tests around incorrect assumptions.

icedchai · 2026-04-17T14:03:33 1776434613

I've seen LLMs implement "creative" workarounds. Example: Sonnet 4.5 couldn't figure out how to authenticate a web socket request using whatever framework I was experimenting with, so it decided to just not bother. Instead, it passed the username as part of the web socket request and blindly trusted that user was actually authenticated.

The application looked like it worked. Tests did pass. But if you did a cursory examination of the code, it was all smoke and mirrors.

svachalek · 2026-04-17T18:18:44 1776449924

Yeah recently it had an issue getting OIDC working and decided to implement its own, throwing in a few thousand extra lines. I'm sure there were no security holes created in there at all. /s

icedchai · 2026-04-17T20:10:26 1776456626

Well, the tests passed, right?

bratwurst3000 · 2026-04-17T14:56:58 1776437818

yes i wished i had safes some of my best examples too. One i had was super weird in chatgpt pro. It told me that after 30 years my interest would become negative and i would start loosing money. Didnt want to accept the error.

FeepingCreature · 2026-04-17T09:07:34 1776416854

Errors compounding is a meme. In iterated as well as verifiable domains, errors dilute instead of compounding because the llm has repeated chances to notice its failure.

coldtea · 2026-04-17T00:49:31 1776386971

Yes, just use random results. You’ve just saved yourself weeks or months of work of gathering actual results.