More

flumes_whims_ · 2026-05-21T13:07:27 1779368847

The overhead shrinks with larger models. It doesn't seem that bad.

flumes_whims_ · 2026-05-18T15:17:06 1779117426

How about a chatbot in front of feature requests that gets all the details the fist time?

flumes_whims_ · 2026-05-18T13:46:50 1779112010

> “AI detected bugs are pretty much by definition not secret, and treating them on some private list is a waste of time for everybody involved – and only makes that duplication worse because the reporters can't even see each other's reports.”

cduzz · 2026-05-18T13:55:31 1779112531

Ah; so it _is_ a tool problem. It is _also_ a moderation problem.

One could ban orgs that flood the zone with AI generated trash, but is there some potential middle ground where there are sets of filters to identify duplicated bugs, and possibly just internally dump "AI spam" to a lower queue?

This seems like the sort of problem I'd addressed in the 90s with killfiles and spamassassin. In other words, can't the ingestion just go through some filters to shield the humans at the end of the pipe?

flumes_whims_ · 2026-05-12T12:09:25 1778587765

And we can guess that recordings for have stored for years back before it got feasible to actually process.

flumes_whims_ · 2026-05-12T12:07:13 1778587633

Benchmarking for giving I don't know rather than wrong answer seems to be the right way to steer industry towards making models that are good at this. AA-Omniscience is one such benchmark.

AA-Omniscience is a knowledge and hallucination benchmark that rewards accuracy, punishes bad guesses and provides a comprehensive view of which models produce factually reliable outputs across different domains. The benchmark contains 6,000 questions across 6 major domains, derived from authoritative academic and industry sources and generated automatically using an LLM-based question generation agent to ensure unambiguity, scalability and factual precision

https://artificialanalysis.ai/evaluations/omniscience

flumes_whims_ · 2026-05-11T15:03:41 1778511821

Probably depends on how "trust worthy" you seem to Google for them to trigger this requirement. Things like using Linux, using Firefox, using a VPN, etc.

greentea23 · 2026-05-11T16:00:42 1778515242

The irony is that no real scammer would use this setup because they know it would stand out.

traderj0e · 2026-05-11T17:46:26 1778521586

Denying scammers the ability to use VPNs and virtual phone farms without standing out does make their job harder

Gigachad · 2026-05-12T01:52:27 1778550747

Or it's just being A/B tested right now.

flumes_whims_ · 2026-05-07T18:37:48 1778179068

Canon law used to require all documents to be published in Latin. That has changed rather recently.

flumes_whims_ · 2026-04-30T18:25:08 1777573508

The Hayes code wasn't policed by the government.

flumes_whims_ · 2026-04-30T17:01:04 1777568464

Related work: https://steve-patterson.com/logic-and-infinity/

flumes_whims_ · 2026-04-27T18:09:22 1777313362

He didn't give it root access, it found root access.