Software problems and bugs might cause only certain transactions to be in error and in need of repair. Unfortunately, as databases grow in size and complexity, so do the chances that bad transactions will corrupt the data on which your business depends.
If a significant number of changes need to be removed, then restoring and rolling forward usually results in the least downtime. If the number of changes that must be removed are minimal, then rolling backward through the logs should result in less downtime. Transaction recovery may seem like the answer to availability problems, but there are a number of cases where transaction recovery is neither possible nor advisable.
In determining the type of recovery to perform, the DBA must consider several questions:
- Transaction identification: Can all the problem transactions be identified? You must be able to actually identify the transactions that will be removed from the database for transaction recovery to work. Can all the work that was originally done be located and redone?
- Data integrity: Has anyone else updated the rows since the problem occurred? If they have, can you still proceed? Is all the data that is required still available? Intervening reorganizations, loads, or mass deletes can require the use of an image copy backup, thereby eliminating UNDO recovery. Will the recovery cause any other data to be lost? If so, can the lost data be identified in some fashion and reapplied?
- Speed: If multiple techniques are viable, which one is likely to perform the fastest? How many database logs are required to perform the recovery? Can anything be done to reduce the number of logs, such as merging incremental copies?
- Availability: How soon can the application become available again? Can you afford to go offline?
- Invasiveness: How invasive was the failure to your database? Were decisions made based on bad data? Can any subsequent work be trusted?
All of these questions actually boil down to a question of cost. What is the cost of rework, and is it actually possible to determine what would need to be redone? This cost needs to be balanced against the cost of long scans of log data sets to isolate data to redo or undo and the cost of applying that data to the database. Of course, an additional question looms large: Which of these recovery techniques are actually available at your site and will they work for the DBMS in question?
Many factors influence the duration of the recovery process. The DBA can implement measures to reduce downtime by developing a smart backup and recovery plan. The following factors can shorten the duration of a recovery:
- The smaller the size of the components that need to be recovered, the shorter the recovery process will be. In general, the less you have to do, the less time it will take.
- Consider partitioning database objects and backing up and recovering at the partition level. Sometimes a failure that would otherwise impact an entire database object can be limited to impacting only a single partition.
- Consider keeping image copy backups and log archive files on disk. Because disk file access is quicker and processes do not need to wait for tape mounts, using disk instead of tape can speed up the recovery process.
- Test your image copy backups to make sure they are valid. Encountering an invalid image copy during the recovery process will lengthen the duration of recovery. When invalid image copy backups are found, steps can be taken to create a new, valid image copy backup before causing a negative impact on recovery.
- Automate your backup and recovery procedures to the greatest extent possible. Automated procedures remove manual error from the equation, thereby minimizing downtime.
- Whenever possible, design databases with as few dependencies as possible. Autonomous database objects can minimize the duration of a recovery because fewer related database objects may need to be recovered at the same time.
- Finally, be sure that every DBA understands the recovery procedures for each database object under his or her control.
Indeed, there are many factors to consider when developing a backup and recovery plan for your production databases.