Last month we looked at various types of database recovery, how they work, and how DBAs need to prepare for recovery scenarios. This month, let’s delve a little deeper into the issues and decisions that DBAs need to be prepared to address as they work on database recovery.
The first thing that DBAs need to be aware of is the recovery time objectives, or RTOs, for the database objects in question. In an ideal world, RTOs would have been established for each object and the backup procedures would be in place to establish sufficient time for recovering to those objectives.
It is the responsibility of the DBA to ensure that the data is backed up according to the recovery time objective outlined for each table space. However, time requirements, lack of resources, or other mitigating issues can get in the way of doing things properly, so RTOs may not exist or be documented anywhere. At any rate, the DBA should get some idea of the importance of the object to frame the recovery requirements by talking to SMEs and end users knowledgeable about the data.
But you can usually safely assume the response will always be “I need it immediately!” This means the DBA needs to examine the available recovery options and select the one that creates the least downtime and executes the fastest.
So, what is the best recovery strategy? It depends.
Historically, recovery was performed mostly for disasters and hardware failures, but this is simply not the case any longer. Most recoveries these days result from application problems. Recent industry analyst studies show that often system downtime is caused by software problems—not hardware problems.
In reality, few DBAs ever need to perform a recovery due to hardware failure except during tests. Though media continues to fail, it fails relatively infrequently these days. And with RAID common, redundancy is built into storage systems, so storage device failures are quite uncommon. User errors and application failures are the most common reason for database recovery and thereby the primary cause for system unavailability.