Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

All modern relational databases implement exactly what you've described... transactions are written to a transaction log, which is flushed to disk every few seconds (or whenever you want to guarantee that a txn is durable). Changes to the actual data need not be persisted in a timely manner, because in the event of a crash the data is recovered from the transaction log.


"in the event of a crash the data is recovered from the transaction log"

Doesn't this statement imply that a disk hit occurred before a client is told that a transaction committed (vs. being told that a unique key constraint was violated, etc.)? I'm talking about a more extreme form where I don't have to wait multiple milliseconds for a disk platter to spin around before continuing with my processing.


For full durability, you configure/ask the DB to fsync the transaction log before reporting the transaction committed to the client.

Most people can tolerate a few seconds of data loss, so a sensible config will only fsync every few seconds and will report a transaction committed before it hits the disk. If the DB crashes, you lose those recent transactions in this mode.

All (?) relational databases let you choose which fsync style you want. Most (?) ship with this setting set to the conservative 'fsync on every commit' mode. Once you configure a SQL database with a more relaxed setting you get a database that performs much more similarly to NoSQL. But some people need full durability - or want it for particular transactions. In that mode, you're basically bound by the the number of IOPS your disk can do, but are guaranteed full durability.


Also note that you can get the best of both worlds with a battery backed RAM cache contained in a SAN storage backend, such that the storage subsystem can be extremely low latency and yet "guarantee" that what it has accepted will get persisted to a disk for durability. (Predictably, this isn't cheap, but it's very effective.)

Your DB host tells the SAN to write this block, the SAN ingests the write to local RAM and reports "got it" to the DB server in sub-millisecond. The SAN will then dump that data to actual underlying discs over the next (hand-wavy) short timeframe, but from the DB's perspective, it got a durable fsync in under a millisecond.


On MySQL / InnoDB, this is innodb_flush_log_at_trx_commit and how the buffer log is flushed can have a tremendous impact on the latency of writes.


So, no physical disk write need occur before a client can continue with processing? If so, cool.


http://dev.mysql.com/doc/refman/5.1/en/innodb-parameters.htm...

``If the value of innodb_flush_log_at_trx_commit is 0, the log buffer is written out to the log file once per second and the flush to disk operation is performed on the log file, but nothing is done at a transaction commit.''




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: