Data resiliency
Data is the lifeblood and foundation for many organizations. Because of this, modern businesses depend heavily on that data being ready and available for use, and problems arise when that data is unavailable. Gartner estimates downtime from man-made and natural disasters costs an estimated $300,000 per hour. As businesses increasingly rely on digital data to operate, they become even more vulnerable to threats. Disruptions such as cyber-attacks, hardware failures, human error and natural disasters all get in the way of data availability. In turn, this sparks organizations to have conversations around how they can fortify their organizations from disruption and enhance their data resilience strategies.

What is data resiliency?

Data resiliency is the ability to have organizational data available and usable regardless of disruptions like hardware failures, cyber attacks, human errors, or natural disasters.

Why is data resiliency important?

Data resiliency is important because it ensures data and crucial information is available and readily useable when needed despite unexpected disruptions.

While it’s not possible to be completely resilient against all threats, it is possible to architect systems to keep crucial operations going even in the event of an interruption.

Think about the various functions and systems that could wreak havoc if even small portions of data become unavailable. Disruptions in financial system data can prevent consumers from making purchases, or allowing businesses to issue payroll for their employees. If a data subset becomes unavailable to a critical government agency—like the Federal Aviation Administration—it can ground logistics across a country. Even a small data disruption in a business can severely interrupt daily operations.

Architecting for data security, high availability and data resiliency is crucial to safeguard against system failures.

Architecting for data resilience

In any business, applications, people, technology and processes are frequently changed or updated. However, whatever underlying data is there will often be migrated to whatever new process or environment is needed. Therefore, data is the most valuable asset a business has regardless of the technology used to access it; further underscoring the importance of data resilience.

No matter where data is stored—on premise or in the cloud—organizations must be able to secure and preserve it, and architect data for resilience. However, in order for that to happen it’s crucial to understand some crucial things about architecting that resilience.

  • The importance of funding and staffing for necessary capabilities and services to maintain resilience
  • Potential data disruption events
  • The impact to an organization if a disruption happens

While these are the main items to be aware of for data resilience, the process to develop data resilience for an organization can be broken down into more stages.

One way to simplify this conversation is to break it down into people, processes and governance.

Each piece of the puzzle in people, processes and governance can be a serious risk if not managed properly. Without governance, there is no handle on what information is where. Without process, there is no guidance on what to do when something goes wrong. Without people, there is no ownership over any of those processes. By extension, infrastructure has become highly resilient with utilization of cloud services. Without data, then the infrastructure is not fulfilling the necessary services it should be supporting.

Any organization’s data resiliency is at risk without each of these pieces in place.

Data resiliency is not the same as data backups or data availability

While data backups and availability are key components of data resiliency, they are not the same as data resiliency.

Data backups

Data redundancy is a critical aspect of data resiliency. Data backups are stored copies of organizational data, typically kept offsite from the original data. Regular data backups are essential for system recovery and data restoration following an incident. However, it’s important to note that having backups alone does not ensure data resiliency. Data resiliency involves designing systems that can endure disruptions and recover effectively.

Data availability

Data availability is all about making sure data is accessible and usable. While it follows some of the same themes of data resilience, the big difference is data resilience aims to minimize the impact of disruptions on data, while data availability is solely focused on data accessibility and usability.

Cascading effects

Outages caused by cyber threats or system failures can impact data resiliency. For instance, power outages can lead to loss of critical system access, hindering real-time backups and replication. Network failures disrupt data flow and synchronization, challenging data redundancy. Mitigating these risks is crucial for data resiliency.

The costs of getting data resiliency wrong

The costs and frequency of downtime don’t seem to be improving. According to Uptime Institute surveys, one in five organizations report “serious” or “severe” outages. Over 60 percent of failures result in at least $100,000 in total losses, and the share that cost upwards of $1 million increased over that same reported time period. Getting data resiliency wrong costs organizations a lot on multiple fronts.

  • Financial costs
    • One of the most prominent losses that organizations encounter when they get data resiliency wrong is what it costs financially. Not only is revenue a big cost, but recovery costs should also be factored in.
  • Data loss
    • The cost of the loss of crucial data shouldn’t be overlooked either. If the type of data lost in a disruption event is essential to basic business functions, then the financial losses are compounded by the loss of essential data to maintain company operations.
  • Productivity
    • Disruptions in any organization’s environment can lead to productivity losses. If the sales team loses access to their CRM, they can’t perform outreach. If an organization can’t access their Active Directory instance, users can’t authenticate their identities to access emails and company files. Productivity disruptions further compound the costs of getting data resiliency wrong.
  • Legal consequences
    • Some organizations are held to data retention terms to ensure compliance, or have service level agreements to guarantee uptime.
  • Reputational loss
    • Businesses take years to build up their brands and reputations. A breach or loss of data can cause customers and prospects to take pause when either adding on to an initial purchase or reaching out as a potential customer.

The only way to even start answering those questions begins with testing disaster recovery scenarios. Start with controlled disaster recovery testing to gauge current confidence and data resiliency capabilities in an organization. Even a paper-based walk-through will be extremely beneficial. If those tests go well, then progress to uncontrolled disaster recovery testing (Chaos Engineering) to proactively identify vulnerabilities to further fortify resilience.

9 practical steps to develop a data resilience strategy

Note that developing a data resilience strategy is about more than simply implementing a tool or set of technologies. It requires cooperation, commitment and understanding across an entire business, its data capabilities and unified coordination in response to potential threats. However, here are some recommended steps to develop a data resilience strategy.

1. Identify risky and important assets

The first step in developing a data resilience strategy is to identify crucial assets and the risks that bubble up based on their potential downtime. What assets or systems must maintain their uptime? What dependencies exist between those assets and systems to maintain that uptime? What data must absolutely be protected? If an asset or system goes down, what are the downstream effects that a company can plan for?

Start by evaluating the assets and data your company has and prioritizing based on risk. Automating the process of harvesting, centralizing and organizing data can make the daunting task much more manageable. Metadata can be used to curate new and existing physical data assets with sensitivity classification indicators and descriptions. This can help provide a map of sensitive data in the enterprise that is associated with the business glossary and aligns to an overall data governance framework.

2. Identify current resilience tolerance

If any identified assets go down, how much downtime can the organization support? What is the maximum acceptable level of harm? Detail the losses generated by unavailability vs the investment in a highly available system for those assets.

3. Triage assets and surfaces to fortify and protect

If every organization had unlimited resources and staffing, all systems and surfaces would be ludicrously fortified. However, businesses often have limitations when it comes to the resources they can dedicate towards architecting their data and systems for resilience. At some point, an organization must be judicious and decide which assets are worth the extra investment in resiliency. Having a handle on current resilience tolerances and costs of downtime should help teams triage what resiliency projects should be started sooner rather than later.

4. Identify realistic disruptions

There are plenty of potential attack vectors or internal threats that present risk across an organization’s data. However, it’s important to be realistic about the threats and conditions that may spark significant downtime on critical systems.

5. Establish resilience requirements

For each of the high priority assets identified, organizations must determine the minimum acceptable resilience standards for that asset. If a particular system requires five 9s of uptime, then those requirements should be spelled out.

6. Prioritize realistic disruptions

For each high priority asset, consider some of the most common disruptions that those systems may encounter. Which ones are most likely to happen? Which ones can the organization most effectively guard against?

7. Take steps to protect at-risk assets

After identifying high-priority assets and systems and their potential threats, look across your infrastructure and take steps to fortify against those threats. What can be implemented to prevent potential downtime? Some basic steps include securing and patching endpoints, implementing vulnerability scans and audits, looking at your data governance framework, evaluating how protected your data warehouse is, implementing data redundancies, failovers, replication and taking a close look at your current data backup strategy.

8. Automate where possible

From identifying risky assets to configuring redundancies and failovers in case of an adverse event, automation is crucial in developing data resilience. Automation assists with faster response times, scalability in managing large complex environments, predictable responses to select actions and cost savings.

9. Don’t forget about backups

Data backups are your last resort solution in case everything fails, not your first move. Ensure a cohesive data backup strategy is in place. Air-gapped, immutable backups can be crucial assets if an entire organization’s data is found to be compromised.

Data resiliency clearly plays an important role in the availability, accessibility and usability of data in the face of adverse events. While total resiliency may not be achievable, organizations can take steps to architect systems to minimize downtime and maintain crucial operations. In a time where disruptions are more expensive and costly than ever, data resiliency is no longer an optional investment but a critical piece of any successful business strategy.

Drive broader data availability with data replication that prioritizes resilience, high availability and scalability

Learn more

About the Author

Michael O'Donnell

Michael O'Donnell PhD. is a Senior Analyst in the Information and Systems Management division of Quest Software and is a highly accomplished and respected author and researcher in the field of enterprise IT. With a deep understanding of the industry, Michael continues to push the boundaries of data democracy and advance our understanding of this rapidly evolving field. With a PhD in computational arterial bypass mechanics, he has made a significant impact in IT, delivering cutting-edge line of business applications and showcasing the power of data-driven decision making. Michael’s expertise in data analysis and innovation has made him a sought-after speaker, presenting at conferences and workshops around the world. His passion for data-driven decision making and commitment to excellence have earned him a reputation as a leader in his field.

Related Articles