Hacker Newsnew | past | comments | ask | show | jobs | submit | flumes_whims_'s commentslogin

The overhead shrinks with larger models. It doesn't seem that bad.

https://arxiv.org/pdf/2409.03992v2


How about a chatbot in front of feature requests that gets all the details the fist time?

> “AI detected bugs are pretty much by definition not secret, and treating them on some private list is a waste of time for everybody involved – and only makes that duplication worse because the reporters can't even see each other's reports.”

Ah; so it _is_ a tool problem. It is _also_ a moderation problem.

One could ban orgs that flood the zone with AI generated trash, but is there some potential middle ground where there are sets of filters to identify duplicated bugs, and possibly just internally dump "AI spam" to a lower queue?

This seems like the sort of problem I'd addressed in the 90s with killfiles and spamassassin. In other words, can't the ingestion just go through some filters to shield the humans at the end of the pipe?


And we can guess that recordings for have stored for years back before it got feasible to actually process.


Benchmarking for giving I don't know rather than wrong answer seems to be the right way to steer industry towards making models that are good at this. AA-Omniscience is one such benchmark.

AA-Omniscience is a knowledge and hallucination benchmark that rewards accuracy, punishes bad guesses and provides a comprehensive view of which models produce factually reliable outputs across different domains. The benchmark contains 6,000 questions across 6 major domains, derived from authoritative academic and industry sources and generated automatically using an LLM-based question generation agent to ensure unambiguity, scalability and factual precision

https://artificialanalysis.ai/evaluations/omniscience


Probably depends on how "trust worthy" you seem to Google for them to trigger this requirement. Things like using Linux, using Firefox, using a VPN, etc.


The irony is that no real scammer would use this setup because they know it would stand out.


Denying scammers the ability to use VPNs and virtual phone farms without standing out does make their job harder


Or it's just being A/B tested right now.


Canon law used to require all documents to be published in Latin. That has changed rather recently.


The Hayes code wasn't policed by the government.



He didn't give it root access, it found root access.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: