Why Active Directory phased recovery is the difference between hours and weeks in your disaster recovery plan

In 2017, all of Maersk’s 150 or so online domain controllers were destroyed as collateral damage in the NotPetya attack, stopping all critical business functions that depended on Active Directory for authentication1. In order to get the shipping giant working again, Maersk IAM admins, completing several herculean acts to get a copy of the only remaining offline domain controller (DC), restored one DC on a Surface Pro 4.2 All they needed was one DC online, even if not on the most ideal hardware, to get the rest of the disaster recovery process going.

This post will focus on why the Maersk approach, and really any IT disaster recovery plan that prioritizes the restoration of key domain controllers gives, Active Directory (AD) admins better control over the recovery process, allows critical services to begin their recovery options faster, and enables sign-in to critical business functions quicker. A phased recovery process is also a Microsoft best practice for restoring a completely inaccessible Active Directory forest.

Active directory disaster recovery

What is a Phased Disaster Recovery Plan for Active Directory?

In Microsoft Windows Server online documentation, they address how to restore an entire Active Directory forest. They cover how to determine which backups to use and you can read about the various backup Active Directory methodologies available for disaster recovery planning in my previous post.

However, looking at the actual restore process (once you’ve chosen your backup), Microsoft recommends having a dedicated DC for each domain that is the preferred DC in a disaster recovery strategy in order to ease the restore process and reliability plan and execute the forest recovery3. I’ll explain more on why this is in the next section.

To support phased restore of a complete Active Directory disaster, Quest Recovery Manager for Active Directory Disaster Recovery Edition lets you perform the initial recovery during the first phase to make the forest function as soon as possible, buying you more time for full forest structure restore in the second phase.

  • Phase 1: Perform initial recovery
    Perform restore of one or several domain controllers in each domain.
  • Phase 2: Redeploy remaining DCs
    Restore remaining domain controllers through promotion

Active directory disaster recovery

Benefits of Active Directory phased recovery

Dependencies! The fundamental reason a phased recovery of AD is important is the dependency chain. If AD is down, then users cannot authenticate to work. If AD and critical applications are compromised, then the restoration of those critical applications cannot happen until AD is restored. Restoring in phases does ease the restore process because it means the difference between minutes, hours, days and even weeks.

Are you going to restore your entire AD forest perfectly before letting the other recovery processes start? No. You are going to restore to a Surface Pro 4 or to more ideal hardware on standby if you had a disaster recovery plan. You’re going for good, not perfect. Phase 2 is your opportunity to plan for perfect.

In said disaster recovery plan, you will have defined the parameters for AD recovery time objectives (RTO) and recovery point objectives (RPO). While RPO will define how often you backup and replicate your AD, RTO will define how quickly the application needs to be recovered within the organization’s tolerance limits. RTO should prioritize which DCs to restore first to keep the business in business.

Let me drive home the point with a scenario. Suppose you have two domains called Domain A and Domain B.

  • Domain A has 3 DCs dispersed across North America
  • Domain B has 10 DCs dispersed across North America and Europe

Ransomware attacks and compromises both domains encrypting disks. All 3 DCs in Domain A and all 10 DCs in Domain B are physically inaccessible.

This means that Active Directory is down and all of the critical applications and users who depend upon it are down too.

So in our scenario, Phase 1 of your Active Directory disaster recovery plan would restore 1 DC from Domain A, such as the one located at Headquarters, and maybe 2 DCs from Domain B, such as the ones in geographically important datacenters.

This is done because with a solution like Recovery Manager for Active Directory Disaster Recovery Edition, you can automate the restore of those 2 critical DCs from clean backups (scanned for malware) in roughly an hour, clearing the way for the rest of the IT disaster recovery plan process. ONE HOUR – give or take depending on how much you’ve built out and practiced your business continuity plan.

Even if some of the connections are slow for remote offices, the point is they can still perform their work while the rest of the other DCs in both domains are restored in Phase 2 (along with faster connection speeds).

With Recovery Manager for Active Directory Disaster Recovery Edition Repromotion mode, AD admins may also run Phase 2 as many times as they like if certain regions are ready to promote their DCs while other regions are not.

Reduce Active Directory recovery time by 90%

Accelerate Active Directory recovery

Recover Active Directory 5x faster than manual forest recoveries.

TL;DR

Recovering Active Directory forest disasters eases the restore process, quickly enables sign-in and reduces the time other critical applications need to wait to begin their restore process. Identify at least one main domain controller in each domain to prioritize in a recovery scenario. Get these online quickly, get the taking transactions and only then turn your attention to the less critical DCs. Go for good first, and perfect later.

Quest Recovery Manager for Active Directory Disaster Recovery Edition automates the phased recovery process and even lets you run Phase 2 as many times as you need. Stay tuned to this blog to learn how to automate and accelerate DC promotion with IFM in Phase 2.

Sources:

1: https://www.wired.com/story/notpetya-cyberattack-ukraine-russia-code-crashed-the-world/
2: https://gvnshtn.com/maersk-me-notpetya/
3: https://docs.microsoft.com/en-us/windows-server/identity/ad-ds/manage/ad-forest-recovery-determine-how-to-recover

About the Author

Jennifer LuPiba

Jennifer LuPiba is the Chair of the Quest Software Customer Advisory Board, engaging with and capturing the voice of the customer in such areas as cybersecurity, disaster recovery, management and the impact of mergers and acquisitions on Microsoft 365, Azure Active Directory and on-premises Active Directory. She also writes thought leadership articles aimed at the c-suite to evangelize the importance of these areas to their overall business. She chairs The Experts Conference, a yearly event focused on pure Active Directory and Office 365 training at the 300 and 400 level for the boots-on-the-ground Microsoft admins and managers.

Related Articles