Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This paragraph explains the problem:

So when a user wants to remove a CAS-addressed document, before really deleting it you need to detect if it's the last reference. This is not easy to do, it is in fact much harder to do correctly than eating the cost of storing duplicate files.

And this paragraph is the purported solution:

And usually when CAS is considered as a solution, it's to solve the need of deduplicating files to save on storage. But even there, the good solution is to give files their own internal uuids as storage keys, store its hash alongside, and generate external uuids for each file upload, then use refcounts to handle the final delete.

The problem is this solution reframes the problem but doesn't solve it. It still requires:

- Accurate reference counting

- Careful handling of deletes

- Synchronization across systems

Which is all part of the original problem.

At the end of the day, you can't safely and scalably do distributed deletes with refcounts unless you centralize the operation, which kills scalability. There are work-arounds, such as marking the file as unreferenced and then running a garbage collector to delete unreferenced files, but the author doesn't discuss them.



indeed, a centralized database with transactions was implied in the solution. You're right to point out this is not always available. I did not talk about it simply because the software I worked on never reached a scale beyond what a centralised database can take. I will edit this article to make it clearer.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: