Database Availability

The Schrödinger's Cat Analogy

Massively over simplified... the cat is in a box with something that will, eventually, kill it... until you open the box you don't/can't know whether the cat is alive or dead.

Compare this to cold DR (e.g disk replication based technologies like VMWare SRM) and tape backup... you can't actually know if your replica is usable until you try to recover (Has the disk replication corrupted your database files? Has the tape got a fault? Has the replication silently failed and nobody has noticed?).

Note that SRM is widely used and should result in, at least, "crash-consistent" recovery. I.e. there should be no more risk than if the database needed to recover after a power outage.

With online replicas (like Oracle Data Guard, SQL Server Always On Availability Groups, and MySQL InnoDB cluster), you can be more certain that you can recover because you can see that running standby database ready to takeover when needed.

And online replicas have other useful features like offloading backups and block corruption detection etc.

Opinion: the key point for me isn't around the reliability of SRM. It's the human factor. If disk replication fails, will that failure be spotted and fixed before you need to invoke DR? Remember that SRM is almost certainly managed by a different team to the databases. If Data Guard fails it's the DBAs fault; the DBA should have configured it better, the DBA should have monitored it better. If SRM fails, the SRM team should have configured it better, the SRM team should have monitored it better... but ultimately that failed database is still the DBA's problem.

Bibliography