Backups are a crucial piece of IT infrastructure. However, backups often become a background function that is deprioritized until data loss happens or another disaster strikes. When backups become a background function, there are organizational oversights that can happen that will disrupt business. Consider the following common oversights organizations make when it comes to their backups, and suggestions to help avoid these issues.
Oversight #1: Having IT make decisions on backup that ignore business needs
In many organizations, the IT department or technology partner is more or less responsible for implementing backups by themselves. On top of that, the budget for backups often comes into contention because data protection is viewed as a cost center, with no immediate benefits to the organization.
The oversight, then, lies in having IT solely make decisions around what gets backed up, how long that data is retained, and what takes priority in backups.
At the root of this oversight is the discrepancy between how management regards data protection and how the responsible IT department regards it. That discrepancy comes from management’s focus on the business, whatever it may be–agriculture, steelmaking, movie production, furniture manufacture, etc.– and IT’s focus on technology to help run the business. When IT is just a mechanism or a method to move a business forward, a misunderstanding about the importance of protecting data becomes more likely.
It’s more appropriate to make decisions around backups based on the company’s current needs around data protection, which assumes IT is involved in that decision alongside of the business.
For example, as companies evolved from on-premises email management to the cloud paradigm of Microsoft 365, many think protecting that data further is unnecessary. They assumed that, because the function was outsourced, Microsoft would protect it. But after a while it became clear that if you use something in the cloud, it’s still your data, so you yourself are primarily responsible for protecting it.
True, you’re using the cloud for functionality and storage. But that does not mean you’re absolved of your duty to protect it. Microsoft offers a platform and does its best to keep that platform available; if you make a mistake on the platform and lose data, don’t knock on their door. As a result, IT departments have had to ask for money to start building data protection environments for data and applications that are no longer located on premises.
Thus, depending on the type of company, you may see a bigger gap between IT and management about backing up data. It can also lead to the gap of higher management thinking everything is protected when it isn’t.
In general, it’s a bad idea to expect IT to know what to keep and what not to keep. More to the point, IT will also not likely know for how long to retain it. What IT does know is that it’s prohibitively expensive to keep everything forever.
On top of that, digital privacy rights also play a role in what to keep and not keep, which further complicates matters for IT. Regulations like GDPR preclude companies from keeping data that is not relevant to the business. It’s complicated to enforce that in the context of data backup when data that is no longer deemed relevant is still in your long-term storage.
Oversight #2: Not having SLAs to set expectations when problems arise
The role of SLAs is to set everyone’s expectations of what IT can deliver in particular scenarios. In this context, a typical SLA would cover something like the length of time IT needs to recover from a specific problem. Suppose a user accidentally deletes a file and needs IT to recover it; the company might set an SLA of three work-hours to recover the most recently backed-up version.
In the worst-case scenario, the company has no SLAs. That means no common understanding of how the business is protected against data loss, ransomware, flooding, fire and other problems. The default is the “best-effort SLA,” which is more like a joke: “If the building burns down, we promise we will do our best to recover the data.”
Not all SLAs need to be formally codified and written down in great detail. SLAs can be hard to maintain, but not having SLAs at all is a big miss. In every organization, there should be a common understanding of what to expect from the data protection in place. Production, administration, middle management, executive management and company ownership naturally have different expectations of how data is protected. It’s a bad sign when they all think the data will always be safe and available, and never subject to loss.
Oversight #3: Having SLAs written exclusively by IT
In some cases, SLAs are solely written by the IT department. Why is that an oversight? Because if you were the IT manager responsible for data protection, how honest could you be in telling management about the job you’re doing? Especially if IT managers were not given adequate tools or budget and need to report that company data is poorly protected against different types of disasters.
A proper SLA describes certain adverse scenarios or events and sets expectations for overcoming them. For example, when users accidentally delete a file, what do they expect will happen? Naturally, they expect that IT can restore the file for them. But there are nuances. Do they expect to get the version back from one hour ago, or the version from one week ago? The CFO, who worked on a document all day until her laptop froze, might expect the former; a report administrator updating a monthly spreadsheet might expect the latter.
The goal of the SLA is to translate the real business need into expectation. For important documents that high-paid employees work on, it may make sense to back up more than once per day. But SLAs apply to much more than just single documents. If your sales database goes down and you have to repair it, the calculus changes. How far back in time do you need to go to find a version of the database that you can use again? The flip-side question is, how long will it take before the database is up and running again?
Thus, in an SLA, you normally talk about two important things: how much data loss you prepare yourself to accept in case of a disaster, and how long it will take before production resumes. You may want zero data loss and zero downtime, but some actions cannot be automated. Zero data loss and zero downtime would require someone on site 24 hours a day.
At some point in working on SLAs, you may come to the practical and financial limits of data protection. You can always protect more, but how much are you willing to pay for it?
Obtaining guidance from an independent company based on interviews with stakeholders at different levels can help organizations get a handle on expectations. The company may not be able to implement everything, but at least the organization is on the same page.
Oversight #4: Conflating backups with disaster recovery
An organization’s backups and disaster recovery plan are not the same.
When you back up data, you make a copy of your production environment and store it somewhere secure so it cannot be changed. Disaster recovery, on the other hand, implies the ability to quickly recover from an outage affecting anything from a couple of machines to your entire data center. In other words, backup is data-oriented and disaster recovery is environment-oriented. With backups, you can restore data in a specific form like a virtual machine, but a backup cannot replace failed hardware.
The main oversight is in hoping you can use your backups for disaster recovery. Although backups can be part of disaster recovery, they are not the same.
Similarly, restore and recovery are sometimes conflated. When you restore, you bring a previous version of data back to its original location; in the case of a Word document, you can restore it from backup. But if you don’t have the application, you can’t open the document. Restoring has brought back the document but not the access to the document. To restore both the document and access to it, you need recovery, which is the next level.
Oversight #5: Not testing recovery
In effect, your backups are passively tested every day; in case of failure, your backup software will notify you. Testing recovery, however, is less passive and requires active practice.
It doesn’t make sense to test the restore of data, be it a single document or an entire database. As described above, you can restore the files, but can you access the data in them? You can test that only if you test recovery, which extends to running the application or front end and successfully reading the data in the file.
The problem is that testing recovery, while important, is time-consuming. For example, if your IT estate contains 100 virtual machines, how many should you test-restore? One? Two? Five?
It’s hard to justify the time involved in test-restoring all of them, so you have to select a sample. Thus, even in your effort to reduce the risk of disaster recovery, you introduce risk.
Oversight #6: Neglecting to train IT in data protection
Data protection isn’t very interesting, but it is very important. How much do your system administrators and IT personnel understand about how the company protects its data?
Admins often know which button to press when the light goes red. But it’s also helpful when they see that their job extends to SLAs, disaster recovery and knowing which departments will be affected if a particular server is down.
Developing the skillsets of IT teams in data protection can lead to teams that anticipate and notice when the needs of the business change. Consider the individuals who bring up the fact that it’s taking longer and longer to back up data. For smart IT managers, that is a cue to ask whether the organization has been growing its backup software and hardware along with its production systems.
Protect all your systems, applications and data.
If broadly trained, backup administrators can evaluate SLAs in light of the needs of the business. They can call out unrealistic SLAs and improve the company’s chances for business continuity in the event of an outage.
Conclusion
Common data backup oversights often boil down to failing to revisit data backup practices and technology. In an era of innovation, what can IT teams bring to backups to make them more effective, more robust and automated? Techniques like deduplication, encryption and immutable backup are part of modern data protection, and it’s a mistake not to take advantage of them.
The need for data backup and data protection is frequently underestimated and receives insufficient attention amid the pace of business and organizational changes. If the use of IT grows as the company grows, when nobody poses uncomfortable questions like “Do we have what we need for data protection today?” “Do we need something better or something different?” organizations will find themselves ill-prepared if recovery is needed or in the event of a disaster.
As business practices change, at a higher level they also impact the way you protect your data.