Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

You just do raw byte copies from sample DB, no SQL or "inserts" or anything similar. Imagine test database consists of 3 parts (all raw bytes)

### PROLOGUE ### Sample row ### EPILOGUE

You copy & write prologue, write 1B sample raws (can optimize this at will, large writes, etc)

Copy & write epilogue and fsync the data. You probably need to modify some metadata, but that should be a few writes at most.

That should be as good as it gets, providing your IO is optimal.



Do you mean crafting all the various database page btree structures and entries yourself? I'd be concerned about subtle bugs.


An SQLite database is just a file. You can build the empty database with schema and base values ahead of time, save it to a file (or an in-memory byte buffer) and then every time you want to create a new database, you just copy that file. No need to do any expensive initialization queries that way. If raw high-speed throughput is needed, skipping that step can make a significant difference.


Yes, that approach makes sense. I thought what I was replying to was suggesting writing b-tree pages themselves, outside of sqlite, for the new data.


I guess if you wanted the fastest creation you could make a custom backend format for sqlite and use that. Especially if query speed was not important.


Assuming SQLite has any internal counters or indexes, let alone B-trees or anything fancy, then this approach won't work. It will work for a raw record format (CSV, JSON, Protobuf, an mmapped array, etc.), but the author wants to actually interact with a real SQLite database. Generating a billion rows in some non-SQLite format still leaves problem to converting that format by loading it into SQLite, which isn't really a reduction of the original problem.


This idea is mentioned as one of the future work at the bottom.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: