Things break and the DBA must be prepared for situations where a failure impacts the availability, integrity, or usability of a database. Reacting to failures and service disruptions is a key component of the DBA’s job. This means ensuring that appropriate backups are taken for all your critical database objects.
Assuming backups exist, when an inevitable database recovery is required, the next decision is how to get the data you need back as quickly as possible. But there are several different types and methods of performing recovery.
Of the different types of database recovery available, the first one that usually springs to mind is a recover to current, to handle some sort of disaster. This disaster could be anything from a media failure to a natural disaster destroying a data center. This generally means that your applications will be completely unavailable until the recovery is complete.
To successfully recover to current, the recovery process must be able to reset the contents of the database to the way it looked just at (or right before) the point of failure. To recover to current, the recovery process must find a valid, full image copy backup and restore that image copy. Then the recovery will roll forward through the database log, applying all the database changes.
If the last full image copy is lost or destroyed, it may still be possible to recover if a previous image copy exists. The recovery process could start with the older backup copy, apply any incremental copies, and then roll forward through the archived and active logs. Of course, more database logs will be required in such a case, so the recovery process will take longer.
If no image copy is available as a starting point, it may be possible to recover the database object using just the database log. If the data was loaded and the load process was logged, recovery may be able to proceed simply by applying log records.
A second traditional type of recovery is point-in-time (PIT) recovery, which is usually done to deal with an application-level problem. PIT recovery is sometimes referred to as partial recovery because only part of the existing data will remain after recovery. Recovery to a point in time removes the effects of all transactions that have occurred since that specified point in time.
To perform a PIT recovery, an image copy backup is restored and then changes are applied by rolling forward through the database log (or log backups). However, only the log records up to the specified time are processed. Sometimes the recovery point is specified as an actual date and time; sometimes it is specified using a relative byte address on the database log.
To successfully recover to a point in time, the recovery process must be able to reset the contents of the database to the way it looked at a prior consistent point. The key is to retain all the good changes to the data, while removing all the “bad.” The PIT recovery can be accomplished by restoring the image copy and rolling forward through the logs to apply any changes; or you just roll backward through the logs removing database changes that occurred after the PIT to which you are recovering.
Regardless of the method used to perform PIT recovery, the DBA must recover to a point where the data is consistent. A consistent recovery point will assure data integrity, referential integrity, and transaction integrity. Some DBMSs provide a QUIESCE utility to achieve a point of consistency, whereas others require the database objects to be stopped (or in read-only mode) during the image copy.
An additional issue is determining what exactly has run since the bad transaction or failure from which you are recovering. The DBA can examine the job schedule, format a report from the database logs, and review the computer console for messages to help determine what processes ran after the failure occurred.
Transaction recovery is a third type of recovery. Whereas traditional types of recovery recover at the database object level, transaction recovery allows a user to recover a specific portion of the database based on user-defined criteria. This can be at a transaction or application program level. A transaction is defined by the user’s view of the process, perhaps a set of panels that comprise a new-hire operation, or the set of jobs that post to the general ledger. The important point is that there may or may not be a correlation between the transactions you are trying to fix and transactions (or units of recovery) in the DBMS.
The types of problems transaction recovery are ideal for include scenarios such as “remove all database updates performed by a specific user since last Wednesday at 11:50 AM” or “reverse all database deletes performed by a specific application program since 8:00 PM yesterday.” Indeed, there are many similar types of problems like this that can occur at the application or transaction level.
Once you have identified the transaction to recover, you can request UNDO or REDO recovery. UNDO recovery removes only the effects of the bad transactions, whereas REDO recovery removes all the transactions after a given point in time, and then reapplies the good transactions only. Typically, transaction recovery is accomplished using reverse SQL; that is, examine the log and change INSERTs to DELETEs, DELETEs to INSERTs, and reverse the impact of any UPDATEs.
There are other options, for example if your approach to backup and recovery is to use storage management software, individual image copy backups for database objects will not exist. In this case, you will need to deploy storage management software to perform the recovery. The actual recovery process will depend on the type of storage management software in use, as well as how (or if) it interacts with the DBMS recovery mechanisms.
Additionally, we cannot ignore off-site disaster recovery, even though it should be the rarest type of database recovery. An off-site disaster recovery is required when a natural disaster (fire, flood, etc.) or other severe accident makes it impossible to use your primary data processing center. In this case, you need to reestablish your entire system environment, as well as recover your DBMS, database objects, and data.
We’ll dig deeper into this topic in next month’s edition of the DBA CORNER, so be sure to come back for more details!