Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> asked LLMs to compile list of 10-20 writers considered canon in each decade since 1800, then identify all their notable works and years of publication. After some iterations with coding agents I got over 2,000 works by 200 authors.

Wait, so the source data is just LLM hallucinations? It makes sense to use an LLM to build the data collection, but not to build your source data.



This is in my opinion a better use of tech that has an error rate (hallucination), you just assume that its a fuzzy search, and sample the results to see how you did. I'd like to see a few from the results for sure!


LLMs cite. So hope they did their due diligence.


It feels a lot like storing your data as an essay in a Word doc instead of a spreadsheet. It can work and all of the math is probably correct, but it's very much the wrong tool when the structured data was right there to be used instead.


The structure data is scattered all over the place. This does the very important thing of aggregating them, and bringing them together. If you had to manually do that it could take weeks.


What’s the point of getting the wrong answer quickly?

https://news.ycombinator.com/item?id=47587662


Well, we’re just going in circles now. I just said LLMs cite what they find so it’s not going to be the wrong answer if you do your due diligence.


Missing entries don’t get corrected by looking at the LLM output. That only helps when the LLM makes something up from thin air or mangles the output.

Of course it’s not the kind of question you can get an objectively correct answer for, but you could come up with the correct answer for a given methodology.


Isn't verifying sources a much harder problem than just searching the list of works in the first place?

Especially in cases such as this. For well known works of literature and music structured data exists already.


Do extra work in step 2 because you got lazy in step 1 is not my idea of efficient or complete.


It’s a long way from got lazy to didn’t write their own Internet scraper to scan for books, author’s age and opinions.


that depends how much more quickly and efficiently you can do the extra work in step 2 than in step 1.


In this case it’s strictly less efficient.

You can only correct for missing entries by doing the same work you’d need to start from scratch. But after that you now have a second list to consider.


What do you mean by due diligence here? Manually checking 2000 citations sounds a lot harder to me than just pulling the data from a reliable source to start with.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: