Data is the most valuable asset your business has. Without it, your company could cease to exist.
COVID-19 is accelerating the drive toward SaaS apps and cloud adoption, and many companies are now adopting a hybrid cloud approach. Most are going about their transformation wisely, with experienced personnel who are measured in their approach. But we’ve all heard of companies going dark when a cloud provider did the unthinkable and went down: Everything was great until it wasn’t.
RTO, RPO, and Your Strategy
Cloud technology offers scalability and convenience. But as companies adopt cloud solutions, it’s worth remembering your IT-sphere reaches into areas beyond your direct control.
Security is a shared responsibility, and it cannot be an afterthought as it was in the past. We’ve all seen those project plans promising a great deal of innovation—with security a footnote to it all. It can’t be that way anymore. The moment your data is in the cloud, it must be secure.
Before signing with a cloud vendor, have a discussion with the provider about its data maintenance procedures and security arrangements. Consider every possibility and discount nothing. Make sure your “earthed” team and your cloud team collaborate on every move.
Intrinsic to any hybrid IT database strategy is the ability to recover. Or, to put it another way, you need a good backup or a good resume, so choose wisely. When building your data strategy, start with your recovery plan. Inside this recovery plan, you’ll define your recovery time objective (RTO) and recovery point objective (RPO).
RPO is the point in time to which you will recover your data, say, to a point in time 15 minutes prior to the disaster event. RPO will therefore determine the frequency of your data backups. RTO is the amount of time it takes to recover the data and be back online.
In a high-frequency transaction environment, just a few seconds offline can represent a great deal of money. Meanwhile, other systems can be down for hours without adversely impacting the business. Organizations want their RPOs and RTOs short, but it’s a balancing act. Some systems and types of data are more equal than others.
Which databases are the important ones—say, tiers 1, 2, or 3—is decided by those in the upper paygrades. It’s about triage, and it’s never easy to break down your data into tiers.
A third metric to factor into your RTO/RPO strategy is the maximum tolerable period of disruption (MTPD). This data point represents how long you can crisis-manage a system outage. The number varies for every application and service under your supervision. Factors that come into play include tangible costs, such as employee wages, lost sales, weakened stock prices, and recovery expenses. There are also intangibles, such as reputational risk.
Once you have your data points and arrive at your database strategy, you must test, test, and test to ensure you can meet your SLAs. As your data becomes more complex, you must revisit your data points through constant verification.
Sample Your Data
By itself, a backup is worthless. Restores are priceless. Therefore, you must test to verify your backups can be restored. However, the size of your environment may be too large for you to restore every database as a test. The answer is to use statistical sampling. I wrote an ebook, “Recovery Sampling and the King Crab,” detailing this process.
Statistical sampling allowed me to use a random sample of 40 databases in my environment. I restored the backups and tested the recovery process. I conducted these tests weekly, each time choosing my databases randomly.
Enterprise organizations have many more databases—perhaps tens of thousands—but the same process applies, albeit with larger sample sizes and more DBAs. Remember, you can achieve a great deal with a few PowerShell scripts. Here, scripting and automation is truly the DBA’s best friend.
Monitor for Punches
You may have the best strategy but if you can’t execute, it your plan is of little practical value. If you’re leaving your AWS S3 buckets exposed to the internet and open for public access, then your database strategy consists of nothing more than hopes and dreams.
Your execution needs locking down. To keep yourself on track, you need monitoring. Monitoring can inform you about whether you’re over-allocating resources and identify shifts in performance. Monitoring can additionally feature automated alerting, notifying you when a system is down or when thresholds are triggered.
Hybrid IT is going to be with us for years to come. Let’s take advantage of its opportunities, but also realize we must now extend our approach to cloud monitoring. Track your RTO, RPO, and MTPD. Make sure you sample and test. Keep on top of it because, as Mike Tyson famously said, “Everybody has a plan until they get punched in the mouth.” It’s best to know now how you’ll react then.