High availability and disaster recovery are both essential to planning overall business continuity. They fortify each other in different scenarios and are the go-to approaches for IT teams trying to reduce the risk of data loss when interruptions strike. However, what is the line that sets high availability and disaster recovery apart and how should IT pros aim to get these two to reinforce each other?
The difference between high availability and disaster recovery
High availability mitigates risks involved with relatively small disruptions that are likely to occur more frequently, while disaster recovery provides a safety net against unique and infrequent system outages like natural disasters that occur less frequently or, ideally, never. Disaster recovery takes over when high availability falls short. Using them in combination, your organization can be assured of resilience in the face of most problems that will befall your IT landscape.
However, both high availability and disaster recovery play a role in business continuity planning, each with a different focus and applicability.
High availability
As noted above, the focus of high availability is on maximizing productive uptime by keeping services, applications and entire systems always running. In spite of occasional problems and low-severity outages, the goal of high availability is to maintain user access to IT services.
Typical elements in high-availability architecture include failover, so that a backup system can be activated in case of failure, and redundancy, to eliminate single points of failure. They also include the ability to load-balance, or distribute workloads, across multiple systems (usually servers) to reduce congestion.
Which kinds of service disruptions does high availability address? Its main value lies in scenarios like short-term power outages, device failures, server crashes and problems with network throughput.
Disaster recovery
With the goal of keeping your business afloat, disaster recovery is designed to return your users to productivity promptly after a large disruption.
Your disaster recovery strategy and practices extend to the software you use to back up your data and applications regularly and to restore them when needed. Since disaster can render your primary IT infrastructure unusable, your strategy includes a fallback location elsewhere and readiness for long-term reliance on that fallback location.
Organizations turn to disaster recovery in circumstances such as fire, flooding, devastating cyberattacks, earthquakes and extended periods without electricity.
How high availability and disaster recovery support and reinforce each other
As part of considerations in your business continuity strategy, disaster recovery and high availability are interwoven in several ways:
- Relationship – It is advisable to think of high availability as part of a disaster recovery strategy, with a smooth transition from the former to the latter.
- Crossover – Get the best of both worlds. High-availability systems are not designed for full recovery, but they usually incorporate failover techniques and redundancies that can shorten recovery time in the wake of a disaster.
- Complementary effects – High availability is a way to lower the operational costs of overcoming frequent, limited disruptions. That dovetails with the way disaster recovery reduces downtime and keeps the business from succumbing to large-scale outages.
- Data protection – Similarly, both approaches protect your data from loss. High availability keeps data and applications in sync across redundant systems, prepared for replication and quick recovery in a disaster scenario.
- Raised consciousness – Planning for availability and disaster recovery heightens the organization’s awareness about risk and business continuity, setting the tone for resilience with all users.
- Constant vigilance – When correctly implemented, both high availability and disaster recovery call for periodic testing, which keeps related infrastructure accessible and ready when needed.
- Preparedness in layers – Because each approach is suited to different scenarios, they afford layered management of the risk that data will be lost. That strengthens your overall defense and reduces your exposure.
Building for high availability and disaster recovery
The line between high availability and disaster recovery is not always cut and dry.
When you think about high availability, the things that come to mind include your servers, your storage and your data. But of those, your data is the most important thing, so disaster recovery has to make your data available again as quickly as possible. Therefore, you might build into your high availability system a method of handling disaster recovery.
The goal is to make your system so resilient that no single point of failure at a production site should give you any major problems. And if you do have major problems at a production site, your disaster recovery should go hand in hand with your high availability. It might take a little bit longer to get going and it may cost you more. But if you’re not willing to invest in high availability, you had better not expect prompt recovery from a disaster.
Example high availability and disaster recovery use cases
Suppose your company has two fully replicated data centers. One of them fails due to flooding and the other takes over. Even though you’ve characterized the flooding as a disaster, it’s your high availability that’s keeping your users productive. It’s not really disaster recovery, because you’ve not had to stop production and take action.
Should your disaster recovery be built into your high availability plans so that you never have to stop production and shift to disaster recovery mode? No backup is of any use to you until you can restore it, so as a matter of business continuity, you need to make sure that you can restore the backup.
The problem with keeping data highly available is that you’re constantly discarding data that you’ve replicated in real time. Then what do you do? You have to go to your backup and restore an earlier copy of the data. But at what point do you want to shift to disaster recovery? It’s probably a disaster if a whole data center goes offline, but is it a disaster if one server goes down? As soon as you’ve lost data in your highly resilient system and need to restore it, then you’ve shifted to disaster recovery mode to get that data back.
So is there a crossover? Certainly. Do they go hand in glove? Yes, your disaster recovery planning should include highly available systems because you’re building an entire insurance policy across everything: your service, storage, platform, applications and data. You don’t want to lose a minute of uptime. But if you do, you have to invoke a different part of your plan, which is to restore from backups.
That’s why an important element of your data protection strategy is to define “disaster” for your business.
Considerations to keep in mind when walking the line between high availability and disaster recovery
Define “disaster” for your business
Imagine that one of your databases stops responding to your enterprise resource planning (ERP) application. Is it that a table has become corrupted, or is your entire suite of SQL Servers down? If the former, your high-availability solution will probably suffice because it protects against smaller outages. If the latter, you may need disaster recovery because it protects against larger-scale outages.
But what does the business itself consider a disaster? Along the spectrum between a broken coffee maker and a fire in the data center, what rates as a disaster in your organization?
It’s a matter of different levels of severity. If, for instance, a network router goes down, in most organizations that doesn’t call for disaster recovery. Sure, the effects can be so widespread that you couldn’t call it anything except a disaster, but it’s more likely a lack of availability at a single point of failure. In some cases, the lack of high availability would entail as much risk and generate as many headaches as most disaster scenarios.
Take a look at your business insurance requirements
It’s not uncommon for insurance carriers to require that you meet certain levels of data protection, without which they will either demand higher premiums or decline to insure you. Why? Because data is now another asset, like a building or a furnace.
Carriers expect you to qualify for insurance not only by having a backup of your data, but also that you keep multiple copies, encrypt them and use immutable backups. They ask pointed questions about business continuity and your disaster recovery plan.
Document your disaster recovery plan
To fulfill those insurance requirements, it’s prudent to build out and document a plan for the steps you’ll follow for each type and severity of disruption you face. What will you do if multiple users accidentally delete important email, or if 50 virtual machines are suddenly corrupted or lost? When you set out your procedures in advance, you leave yourself valuable guidance for when things go sideways. Plus, it’s easier to contemplate recovery from one relatively small disruption at a time than to wrap your head around recreating your entire IT landscape.
Create your plan with an eye to the relative importance of the data involved. For example, email may seem like the highest priority, but what about the database behind your ecommerce website? It holds all your transactions that are of real value to the business, so wouldn’t that be more important to your bottom line? Examine each data set and ask, “What if that went wrong?” You apply a disaster recovery process to that, implement high availability across multiple servers and back up the data set regularly as well. That gives you a plan for that system, so you move on to the next system and create a plan for it.
Once you have the small-scale plans in place, you can establish an order for executing them in case of a large-scale disruption.
Most of all, creating and documenting your plan in this way ensures that the entire library of institutional knowledge is not locked inside one administrator’s head.
Enforce change control
As high availability and disaster recovery plans are being put in place, make sure that change control processes are solid and compatible with the potential growth of organizational systems. System changes – including updates, patches and upgrades – almost always introduce problems, and with robust change control you can anticipate the effects of those changes.
We’ve seen a small percentage of companies that are appropriately rigorous about change control. They establish rules that prohibit the roll-out of any new production applications or data sets unless backup and data protection have been provided for.
Protect all your systems, applications and data.
That’s prudent, because it’s normally the other way around. Much more often, we see IT teams with jam-packed task lists, rolling a new workload into production before they’ve figured out how to back it up. Too late they realize that their current backup software isn’t well suited to it, so they have to buy a new backup product for that alone. Worse yet, they may procrastinate – “We’ll sort that out later” – and the application or data set never gets backed up. That’s an example of a change that needs control wrapped around it.
Conclusion
Your first task is to evaluate each of your workloads for data protection and answer three main questions:
- Do we need to make it highly available?
- Does it need disaster recovery?
- Do we need to back it up?
As you establish priorities, you’ll realize that some workloads and endpoints are more important than others. Print servers, for example, may be important during business as usual, but business isn’t usual when you’re recovering from a disaster, so emphasize data protection where it most counts.
When you combine the small-disruption focus of high availability with the large-outage focus of disaster recovery, you equip IT for the resilience and the continuity planning businesses need to thrive.