Your data is important. It’s the lifeblood of business and has never had so much impact when it’s not available. However, not all data is equal. When we look at the lifecycle for data it needs examining about what you are going to protect, how you are going to protect it, and where you are going to keep it.
This blog post examines the storage of data in different tiers in the cloud, or cloud tiering. It examines the benefits and costs associated with tiered storage of backup data. Backup administrators responsible for their organizations’ data protection can evaluate the potential for applying storage tiering in their own IT landscapes.
What is storage tiering? How does it work?
Storage tiering is the practice of taking specific backup data that is unlikely to change or be accessed, and moves it to low-cost storage, usually in the cloud. When you use storage tiering, you take enable long-term retention of backup data and it can make your data recovery transparent to the backup system, while also taking advantage of lower pricing. And, through automation, you can enjoy all those benefits with almost no manual intervention.
In most organizations, storage tiering is regarded as a way of using a cheaper storage construct for older data. The older the data gets, the less likely anybody will change or even access it. Nevertheless, you must retain the data for multiple years. For example, to comply with regulations or to be prepared in case of an audit. With storage tiering, as data ages, you move it to progressively less costly, lower-performance media. That frees up your local high-speed, backup storage for more recent data that may be accessed for faster recovery.
It involves a balancing act because of the costs of having data in the wrong tier. Leaving in high-speed storage backup data that nobody has accessed in three years costs you in capacity. Conversely, having to restore from an older data set that has been tiered costs you in egress fees, plus the time penalty of slower retrieval.
That’s why storage tiering is a continual effort to ensure that your time threshold for moving backup data to the next tier is neither too early nor too late. Your timing will never be perfect, but the objective is to estimate incorrectly as rarely as possible. You want to avoid ending up with unbalanced storage, in which too much data is moved too quickly to low-cost media and recovery time objectives are impacted by slow retrieval.
Starting your storage tiering initiative will involve human interaction as you configure policies to govern which sets of backup data to move, and when, where and why to move them. In time, though, automation should guide the movement of your data from one tier to another.
How should organizations determine which data goes where? What is a typical tiering scheme?
How will you classify your datasets for tiering? Will you decide according to the backup data type, like databases, virtual machines, files and folders? Or by compliance, like seven-year data retention requirement? Or a mix of both type and compliance.
In most cases, if you need to restore the data, it is likely that you’ll need to do it within the first month after its creation. But you still have to keep it for, say, another six years and 11 months, and you’ll want that to be cheaper to store as it gets older. That is how you arrive at a policy to store the most recent 30 days of data locally. Then, as it ages, you’ll want to set your tiering policy to move the data elsewhere, whether to less-expensive, on-premises storage or to the cloud.
In wide brushstrokes, you can break storage down into a few tiers:
- Hot storage is mission-critical, high-performance media such as solid state drives (SSD), storage area networks (SAN) or non-volatile memory express (NVMe) cards. You keep these close to the CPU, preferably on the same bus. Hot storage is ideal for databases.
- Warm storage includes media like directly attached, spinning disks.
- Cold storage usually refers to cloud or tape – low-cost media for the data you’re least likely to need to recover and for which long retrieval times are acceptable.
What are the main considerations in a tiering strategy for backup data?
Capacity
As noted above, you could estimate incorrectly and find you must restore recent data pullback from an older, slower tier. Consider how much data you may have to pull back at any one time. Will there be enough room for it?
Breakage
Suppose part of your storage tiering system breaks. What if your tapes fail or your local backup storage falls prey to ransomware? What happens to the data left on the other side of the breakage, that old data now effectively stranded? How do you get that back? If you’re audited at some point in the required retention period, your auditors won’t likely want to hear, “Our data is there, but we can’t actually get to it.”
Media
Even low-performance media should share some basic behaviors of high-performance media. In all tiers, are the data constructs self-describing? Are you able to do normal things like reattach the media, retrieve the data and create indexes?
Investment
Obviously, keeping seven years’ worth of data in a cold tier will cost you much less money than keeping it in a warm tier. What will you do with the money you save? Once the data is in the cold tier, you’ve effectively freed up money you can spend improving your hot tier; for example, with more, faster storage media. One of the benefits of storage tiering is that you can invest in improving performance for users and customers in your production environment. That way, you balance out the cost of your tiering structure, based on where your data is.
Budgeting
Along with your data classification and policy decisions about what you’re going to move is the cost-benefit element of storage tiering. What’s the cost of keeping the wrong type of data in the wrong place? What’s the benefit of keeping the right data in the right place? The answers to those questions are quantifiable, with ramifications for your budget.
Service-level agreement (SLA)
It’s normal to expect an SLA from a provider of services like cloud storage, but there are important nuances to observe.
Most providers offer an SLA on resilience, measured as a percentage of uptime – for example, 99.999% (“five nines”) – but that refers to infrastructure and accessibility. That does not extend to redundancy of the data stored in their cloud, nor to protection of your backup data itself. For those, you would have to configure replication of your cloud storage to another geographical zone.
Time to retrieve
In the days before cloud storage, many enterprises maintained huge libraries with hundreds of tapes, connected to the backup infrastructure servers. If you wanted to recover data that was on tape, you ran a restoration job. Somewhere in a data centre, a robotic arm in the library was activated to find the right tape, insert it to the tape drive, then find and restore your desired data. That could take between five and 10 minutes.
Today, if you had to wait that long to start recovering a data set from your cloud storage, you’d be in trouble. Users’ expectations (and lack of patience) can prompt you to say, “We still want the benefits of storage tiering, but we don’t want slow retrieval anymore.” Resolving that conundrum depends on the tools you choose for enterprise backup software and optimizing secondary storage.
Cost to retrieve
When shopping for cloud tiering, most IT professionals focus on a monthly storage cost, then multiply it by the amount of data they envision storing. That’s a commendable start, but other, inevitable charges include:
- data retrieval cost per GB
- cost per 1,000 write requests
- network costs per GB (inbound)
- network costs per GB (outbound), or egress costs
Egress costs are especially pernicious. While the large print may read, “You pay for only what you use,” the fine print requires more attention. With some cloud storage providers, if you retrieve a greater quantity of data than you store in a given month, they may charge you for the excess. Others just charge you to egress data regardless.
Deduplication
Even after you’ve classified your data and made decisions about where to put it in your storage tiering scheme, you’ll want to optimize what you store. You can compound the savings from warm and cold storage with technology that also optimizes the data set to reduce cost further.
The less data there is to move from one tier to the next, the lower the overall cost for storage and transfer. Deduplication is a proven technology for reducing the amount of data you have to back up and send over the network. Using algorithms, deduplication software scans the data and removes blocks that have already been stored. It replaces them with a pointer to similar, backed-up data, and the pointer is used to rehydrate the deduplicated data later.
By reducing the net amount of data to be backed up before it goes to storage, you can significantly increase throughput and accelerate the movement of data. Combining deduplication with storage tiering – whether on-premises or in the cloud – paves the way to lower pricing with shorter retrieval time.
Maximize the business impact of your data
Make it simple to shop for enterprise data
Protect all your systems, applications and data.
What are the advantages of tiered storage?
The main advantages include the following:
- Reduced storage costs, as described above
- Outsourced infrastructure, especially if you go to cloud
- Storage efficiency, if you implement a secondary storage product with features like compression and deduplication
- Improved disaster recovery, in that you’re protecting your data outside of your normal storage construct. With cloud storage, for example, it’s a storage construct that’s offsite and somewhere else entirely. However, the tiering system you implement must be intelligent enough to recover data from that remote site without the original source being provided.
- A paradigm for hybrid storage, with fast, local data retrieval linked to cloud storage. Smart cloud-tiering products effectively extend local storage into the cloud, offering more-intelligent use of your datasets.
There is also an angle on repurposing older, slower equipment for the lower tiers, which do not require high performance. But is the equipment still supported by its vendor? How much of its useful life remains? Entrusting data protection and integrity to any hardware in that category adds an element of risk.
Conclusion
The essence of storage tiering is that not all of your backup data needs to stay on expensive, high-speed media. As data ages, the likelihood diminishes that anyone will want to recover it. In an era of numerous options for storage, smart IT professionals turn to storage tiering and tools that automatically move data into progressively less-expensive tiers. Generally, the colder the storage, the longer the retrieval time when the data does need to be accessed, but secondary storage tools can ease that pain. Successful storage tiering balances the amount of money you’re willing to spend on retaining the data against the time you’re willing to wait while recovering it.
The ideal approach to tiered storage involves hybrid tools that move data programmatically between warmer and colder tiers, whether on premises or in the cloud. Moreover, after initial configuration, they allow for complete automation, moving the data without any manual intervention or ongoing human effort.