Your backups are getting pretty fat, so you decide to try cloud backup.
“Straight out the door and into cloud storage,” you tell your fellow admins. “That way we don’t need to worry about storage space.”
Indeed, your capacity in cloud storage is almost unlimited. Plus, with cloud tiering you can store your long-term backup data for zero network-in charges and fractions of a penny per gigabyte per month. Sure beats dealing with tape drives and physical media, doesn’t it?
So what’s the catch?
Want to retrieve that data from 2016? Ker-ching.
The catch is that sending the backup data out the door and into cloud storage is one cost item. Bringing it back in the door again later is another cost item altogether.
How much backup data do you plan to store in the cloud? It’s probably not the 247 petabytes that NASA’s Earth Science Data and Information System (ESDIS) program will be handling in five years, is it? NASA plans to send that data out the door to Amazon Web Services, which is a good, cost-effective move that means NASA doesn’t have to manage all that infrastructure. Cloud tiering with AWS includes S3 Standard, S3 Standard-Infrequent Access, S3 Glacier and S3 Deep Glacier Archive service levels.
The problem is that NASA also plans to allow researchers and commercial users to access the data, which means cloud egress: bringing it back in the door from AWS. Per gigabyte, that could cost as much as $.035 in retrieval and up to $.09 in network-out costs (see below).
An audit suggests that, on top of NASA’s $65-million-per-year deal with AWS, egress charges could add $30 million a year by 2025.
The data egress costs strike back
The lesson is that, as you model your cloud backup cost, you should ask yourself, “How many times will users access this data? And when?”
- If the likelihood of access is high — say, your backup from last Tuesday — then either leave it on-premises in your data center or put it in a standard cloud storage tier.
- If the likelihood is low — say, patient data you need to retain for seven years for compliance — then put it in an inexpensive archive tier of cloud storage in two years.
The table below, while not comprehensive, summarizes the cloud backup cost of different tiers offered by AWS and Microsoft Azure (as of November 2019).
Service | Tier | Time to First Byte (TtFB) | Storage Cost/ GB/Month | Data Retrieval Cost/GB | Cost/1,000 Write Requests (PUT) | Network Costs/GB In | Network Costs/GB Out |
---|---|---|---|---|---|---|---|
Amazon S3 Standard | Hot | From minutes to hours | $.024-.026 | $.00 | $.0055 | $.00 | $.00-.09 |
Azure block blob “Hot” | Hot | From minutes to hours | $.017-.0184 | $.00 | $.005 | $.00 | $.00-.087 |
Amazon S3 Standard-Infrequent Access | Cold | From minutes to hours | $.0152-.019 | $.01 | $.01 | $.00 | $.00-.09 |
Azure block blob “Cool” | Cold | From minutes to hours | $.01 | $.01 | $.01 | $.00 | $.00-.087 |
Amazon S3 Glacier | Archive | From fractions of a day to days | $0.005 | $.011-.033 | $.055 | $.00 | $.00-.09 |
Azure block blob “Archive” | Archive | From fractions of a day to days | $.00099 | $.02-.05 | $.01 | $.00 | $.00-.087 |
Amazon S3 Glacier Deep Archive” | Deep archive | From fractions of a day to days | $0.002 | $.022-.0035 | $.06 | $.00 | $.00-.09 |
Note that the Azure Archive and Amazon Deep Archive cloud tiers cost less per gigabyte of storage but more per transaction, with longer Time to First Byte (TtFB). You knew there would be a trade-off, didn’t you? There’s always a trade-off.
Deduplication reduces the amount of data that heads to cloud
So, like NASA, you’re worried about the egress cost involved in restoring your backup from the cloud. How can you put less data up there in the first place?
The answer is deduplication — in particular, variable-length, source-side deduplication — and compression of your data before you hand it off for backup. The less data there is to move into the cloud, the lower the overall cost to store and transfer it.
Deduplication uses algorithms to scan the data and remove any elements that have already been stored, replacing them with a pointer to similar, backed-up data. The result is that you move less data, both to and from the cloud, and the data that you do move, moves faster.
When you combine compression and deduplication with a strategic mix of hot and cold tiers from cloud providers, you can get the best of both worlds: lower overall pricing and shorter TtFB.
Protect all your systems, applications and data.
Next steps
Sure, the cloud is a well-protected place to put your cloud backup. But remember to build the egress cost into the model when you’re making your decision.
We’ve released a white paper by called Cloud tiering and object storage for backup — balancing cost and speed. Title tells all. It covers the cost advantages of cloud tiers, then examines the architectural changes like security and deduplication that ensure cost-effective data protection with reasonable retrieval times.
You’ll find a checklist of questions like “So, why not store all backup data in the cloud, if the cost model favors it so heavily?” and “Should I choose hot and cold storage? Or archive and deep archive?” Get good answers to your questions before you send all of your backups and archives straight out the door and into cloud storage.
And NASA, if you’re reading this, we’d be glad to have a chat . . .