All modern relational databases implement exactly what you've described... trans...

ShabbyDoo · on Jan 5, 2011

"in the event of a crash the data is recovered from the transaction log"

Doesn't this statement imply that a disk hit occurred before a client is told that a transaction committed (vs. being told that a unique key constraint was violated, etc.)? I'm talking about a more extreme form where I don't have to wait multiple milliseconds for a disk platter to spin around before continuing with my processing.

justinsb · on Jan 5, 2011

For full durability, you configure/ask the DB to fsync the transaction log before reporting the transaction committed to the client.

Most people can tolerate a few seconds of data loss, so a sensible config will only fsync every few seconds and will report a transaction committed before it hits the disk. If the DB crashes, you lose those recent transactions in this mode.

All (?) relational databases let you choose which fsync style you want. Most (?) ship with this setting set to the conservative 'fsync on every commit' mode. Once you configure a SQL database with a more relaxed setting you get a database that performs much more similarly to NoSQL. But some people need full durability - or want it for particular transactions. In that mode, you're basically bound by the the number of IOPS your disk can do, but are guaranteed full durability.

sokoloff · on Jan 5, 2011

Also note that you can get the best of both worlds with a battery backed RAM cache contained in a SAN storage backend, such that the storage subsystem can be extremely low latency and yet "guarantee" that what it has accepted will get persisted to a disk for durability. (Predictably, this isn't cheap, but it's very effective.)

Your DB host tells the SAN to write this block, the SAN ingests the write to local RAM and reports "got it" to the DB server in sub-millisecond. The SAN will then dump that data to actual underlying discs over the next (hand-wavy) short timeframe, but from the DB's perspective, it got a durable fsync in under a millisecond.

dorianj · on Jan 5, 2011

On MySQL / InnoDB, this is innodb_flush_log_at_trx_commit and how the buffer log is flushed can have a tremendous impact on the latency of writes.

ShabbyDoo · on Jan 5, 2011

So, no physical disk write need occur before a client can continue with processing? If so, cool.

electrum · on Jan 5, 2011

http://dev.mysql.com/doc/refman/5.1/en/innodb-parameters.htm...

``If the value of innodb_flush_log_at_trx_commit is 0, the log buffer is written out to the log file once per second and the flush to disk operation is performed on the log file, but nothing is done at a transaction commit.''