How to create a data recovery plan

Experts emphasize that a solid data recovery plan is vital for every organization. After all, threats to your IT ecosystem are inevitable, from cyberattacks like ransomware to weather emergencies like wildfires and hurricanes to devastating mistakes by overworked IT pros.

You cannot eliminate these threats — but you can maximize your resilience when they strike. The key to speeding recovery and getting your business back on its feet is having a solid, tested data backup disaster recovery plan. As Gartner states, “Trying to improvise a recovery process in the aftermath of an attack will inevitably lead to mistakes and prolong the outage.”[1]

But what does a data recovery plan include? How can you ensure yours is effective for both ransomware recovery and recovery from other threats? This post will walk you through what you need to know.

Building a solid data recovery plan: the four key areas

The core strategies for establishing a sound data recovery plan can be grouped into four areas:

  • Inventory and assessment
  • Planning and prioritization
  • Backup and recovery strategy
  • Review and testing

Let’s dive into each of them.

1.    Inventory and assessment

Group your data and applications.

The first step in developing a data backup disaster recovery plan is to group your data. Which data and applications are most essential to the organization? What can you survive a little longer without? Which data is mostly static? Exactly where is the data located?

The table below shows a data grouping with three classifications: static, business vital and mission critical. This is only an example; you may decide that your recovery plan needs a different set of classifications.

Figure 1. Categorizing data

To decide which data falls into each category, think about how dynamic it is and how vital it is to core business functions. For instance, tools like Microsoft 365 and SharePoint are essential to user collaboration and communication, so the sensitive data stored there might be assigned to a higher classification than file systems, which might change less often.

The highest classification, mission critical, includes things like databases used for financial transactions, which are vital to keeping the business alive and meeting contractual and legal requirements. Note that Active Directory also falls squarely into this category. Active Directory provides the authentication and authorization services that enable your IT ecosystem to function. Without Active Directory, users cannot log on to their endpoints or access IT assets, and applications and services cannot run. In short, your business is dead in the water. Take it from Gartner: “The restore process from many well-documented ransomware attacks has been hindered by not having an intact Active Directory restore process.”[2]

Establish recovery goals.

Next, it’s time to establish the recovery time objective (RTO) and recovery point objective (RPO) for each type of data. Work with stakeholders to determine how quickly the data needs to be restored and how much data loss is acceptable. Remember that their initial response is likely to be that they require zero downtime and zero data loss. You will need to help them understand just how costly that goal would be, and agree upon reasonable, balanced expectations for your data recovery plan.

In many cases, you will want to go deeper and also set a version recovery objective (VRO) and a geographical recovery objective (GRO). To establish the VRO, ask how recent a version of the data you need to be able to restore. GRO addresses where backups will be stored and how that affects the other recovery requirements; for example, RTO might be longer when recovering from a disaster recovery site than from a local backup.

Here’s an example of the resulting extended table for a data recovery plan:

Figure 2. Defining recovery metrics for each classification

2.    Planning and prioritization

Uncover and assess vulnerabilities.

Another core step in developing a data recovery plan is to uncover weaknesses that adversaries can take advantage of, and then prioritize them by assessing both the likelihood of that happening and the potential damage that could result.

Here is a standard risk assessment matrix for categorizing risks as low, medium, high or extreme, based on the combination of their probability and impact:

a Standard risk assessment matrix

Figure 3. A standard risk assessment matrix

During this process, be sure to consider insider threats as well as external attacks. Hackers are now actively attempting to bribe employees to plant malware, for instance. It’s vital to work toward establishing a Zero Trust model.

Understand dependencies of the data.

As you build out your data recovery plan, think through the dependencies involved in restoring not just data, but access to that data. After all, restoring your mission-critical Oracle database quickly is of essentially no value unless you have also restored the components required to bring it up and enable users to access it.

In addition, concentrate on recovery time rather than backup speed. Backups often proceed quickly because after the initial full backup, subsequent backups need to cover only the differential changes. Therefore, it’s important to really determine whether you have the bandwidth to restore data in the time required by your recovery plan. As noted above, this process usually involves understanding the geographical locations of your backups.

Assess your tools for speeding recovery.

Consider what other tools you have that can speed the recovery process. Data replication tools can be immensely helpful. So can snapshots, though it’s important to note that snapshots are not a substitute for enterprise backups and are often vulnerable to being corrupted during cyberattacks.

Determine authority for executing your plan and ensure effective communications.

Make sure your data backup disaster recovery plan is clear about who has the authority to start the recovery operation. Otherwise, no one may take the initiative, prolonging the downtime and damage to the organization.

In addition, be sure that the recovery plan includes strategies for communication that do not rely on systems that might be down or inaccessible. You don’t want your only copy of the plan to be on a server that has been encrypted or even in a platform that hasn’t been targeted but that no one can log on to because Active Directory is down.

3.    Backup and recovery strategy

Think beyond “data” — your strategy must cover Active Directory, too.

While your backup and recovery strategy clearly has to include data like the records stored in your databases and file systems, that data will be useless if nobody can access it because your Active Directory is down. Therefore, as noted earlier, Active Directory itself counts as mission-critical data. For example, when shipping giant Maersk was hit by malware, it had backups of much of its data — but not of Active Directory. It was saved from rebuilding its AD from scratch only because it discovered a lone domain controller that had been offline during the attack and was able to painstakingly shuttle it thousands of miles to serve as a backup.

Keep in mind that backing up and restoring Active Directory are complex processes. For example, you need to minimize the places that malware can hide lest you restore the infection when you bring your domain controllers back to life, as well as streamline the lengthy native domain recovery procedure as much as possible to minimize business downtime. Accordingly, it’s vital to have an advanced Active Directory backup and recovery solution that provides control, flexibility and automation. Gartner advises: “If possible, invest in dedicated tools for Active Directory recovery as the Microsoft tools and procedures along with the limited capabilities of enterprise backup tools are often not fit for purpose.”[3]

Protect your backups.

Since recovery is impossible without viable backups, it’s vital to protect your backup systems and data against encryption, corruption, deletion and other interference. One way to limit the network interfaces to backup storage repositories is to create a subnet and put the repository where it can’t be seen on the public network. In addition, your backup procedures should include encrypting the backup data rather than storing it in a recognizable format, to make the data as useless as possible to bad actors.

Have more than one copy of the data.

Your data recovery plan should cover backup locations and formats. The classic strategy is 3-2-1: Have three copies on two different storage media with one copy off-site. The diagram below illustrates this model. The main backups are stored in the data center at the corporate headquarters, and they are replicated to the regional office. In addition, the backup data is being stored in the cloud.

recovery strategy with multiple locations and formats

Figure 4. Sample backup strategy with multiple data locations and formats

While cloud services can be a valuable part of a data recovery plan, there are caveats to keep in mind. First, remember that there are threats that can encrypt files even in the cloud. In addition, many cloud services provide great bandwidth for uploading into the cloud but do not offer the same bandwidth for downloading your data out of the cloud. Therefore, it’s critical to look closely into the cloud provider and the details of your agreement with them to ensure you can meet your recovery objectives.

Ensure your backups are immutable.

In addition to strictly limiting access to backup files, organizations also need to make sure that the data cannot be modified or deleted even if someone were to gain access. Some backup and recovery solutions offer immutable backups that protect backup data for its established lifetime (retention period).

Automate.

Manual processes are inherently slow, unreliable and prone to human error. That’s true for both backup and recovery. You need backups to be taken correctly and happen on schedule, no matter who’s on holiday or what other priorities might come up for the IT team.

Automation is perhaps even more vital to recovery operations — especially disaster recovery, when every second of downtime is costing the business. As Gartner puts it, “Prompt recovery of affected systems will be impossible if organizations have to rely on manual processes and procedures. This is true, regardless of the approach adopted for recovery, and failure to automate major parts of the recovery process will lead to unacceptable delays in the restoration of normal service.”[4]

4.    Review and testing

Test and revise your data backup disaster recovery plan on a regular basis.

All too often, organizations put a great deal of thought, effort and time into developing a comprehensive data recovery plan, but walk through it only once or twice and then put it away until a disaster strikes. However, threats continue to evolve, and the IT ecosystem continues to change. For example, you might adopt a technology that is critical to your business operations and that adversaries are actively looking to exploit — making it a “high” or even “extreme” vulnerability. But your data recovery plan does not even mention it.

Similarly, business processes and needs can change dramatically even in a short period of time. Data that is business critical today might have far less value tomorrow. You don’t want to discover that your plan fails at the moment when you actually need it.

Therefore, it’s essential to establish a regular schedule for reviewing and updating your recovery plan, and testing the revised plan to ensure it meets your needs. The schedule will vary based on the organization’s needs, but the table below details different types of testing and offers guidance about recommended frequencies.

Test your data recovery plan using people who didn’t develop it.

Finally, make sure that your plan does not depend on any particular individuals. After all, the IT pro who headed up development of the plan might be on vacation when disaster strikes. Moreover, with the modern IT skills shortage and the broader great resignation, organizations are seeing significant turnover in their IT teams. You need to ensure that the data recovery plan can be implemented properly regardless of these circumstances.

Accordingly, make sure your recovery plan is well documented and can be executed by your IT team even if your most valued IT pros have left the company or are simply unavailable.

Conclusion

A quick glance at the news proves that no organization is immune from modern cyber threats; they affect everyone from SMBs to large enterprises, and from private industry to schools, government agencies and critical infrastructure. While cyberattacks leap to mind, organizations are also vulnerable to natural disasters, insider threats, administrator errors and other risks.

While you can (and should) work to strengthen your security posture to minimize your exposure, you cannot eliminate the risk of disaster striking. The question is, how quickly and how well can you recover when a disaster occurs?

To achieve the cyber resilience required to protect your organization, you need to build a solid data recovery plan and test it regularly. Following the strategies laid out here will help you establish and meet your data recovery goals.

1 Gartner, Inc., “Restore vs. Rebuild — Strategies for Recovering Applications After a Ransomware Attack,” Nik Simpson and Ron Blair, 2 March 2022 (ID G00761039).

2 Gartner, Inc., “How to Recover from a Ransomware Attack Using Modern Backup Infrastructure,” Fintan Quinn, 4 June 2021.

3 Gartner, “Restore vs. Rebuild — Strategies for Recovering Applications After a Ransomware Attack,” Nik Simpson and Ron Blair, 2 March 2022 (ID G00761039).

4 Gartner, “Restore vs. Rebuild — Strategies for Recovering Applications After a Ransomware Attack,” Nik Simpson and Ron Blair, 2 March 2022 (ID G00761039).

About the Author

Brian Hymer

Brian is an avid computer expert with over 30 years in the IT industry. He has a varied background and has worked in IT for power, retail, healthcare, insurance and financial organizations. Over his nearly 21 years at Quest, he has travelled to customers around the globe sharing his experience and helping them implement and use Quest products in a wide variety of environments. He has also presented on numerous worldwide webinars, spoken at conferences and is a subject matter expert on the Windows Security log and Active Directory Forest recovery.

Related Articles