Cloud object storage and the strategic use of cloud tiering are hot trends for data protection, but you need to balance cost and speed for the best solution. So how can you get the best of both worlds? To start, be sure you have the right data protection strategy and architecture in place.
Where should you store your old data?
While access and availability to today’s data is critical to keep business going, easy access to yesteryear’s data is just as important. The need for long-term storage is driven by business imperatives that never stop growing — backup/restore, disaster recovery, audit readiness, regulatory compliance and more.
Should you store your aging data on site? That keeps it close at hand, but at the high cost of infrastructure, maintenance, electricity, hardware, software and, most of all, your time. Should you put it in the cloud and forget about it? That might seem more simple and less costly than on-premises storage, but it’s hardly care-free and it’s not always less expensive.
Then there are the infrastructure questions. Cloud-based file, block or object storage? Private or public cloud? Amazon Simple Storage Service (Amazon S3), Azure Blob Storage or some other cloud storage provider? Cool or cold tier?
But while a critical need, nobody wants to think much about long-term storage – they just want cost-effective and automated backup/restore, disaster recovery and archiving. But a little bit of strategy can go a long way when it comes to cloud tiering for object storage.
A quick storage refresher
Before we talk about the advantages of using cloud tiers for long-term storage, let’s first have a quick refresher on the types of storage options offered by cloud providers.
File storage makes it easy for applications to find data on a network and retrieve it. A file system provides the namespace for identifying files and manages metadata like owner, modification date and size. But all that convenience brings the overhead of file access privileges, file locking, copying and manipulation. File storage scales up reasonably well for accessing hundreds of thousands of files in a file system, but not for accessing billions of files in a backup repository.
Block storage works at the level of blocks, which use sector addresses instead of filenames and metadata. In a storage area network (SAN), the SAN software can find, read and write data without the overhead of descriptions and user access privileges. The low latency of block storage is suited to databases and transactions where performance is a priority. However, block storage depends on access to a running server with a file system, and its cost structure is tied to the entirety of space allocated, whether it’s all being used or not.
Object storage is used to store and retrieve both structured and unstructured blobs of data as a whole, rather than as individual blocks. Objects may consist of image files, HTML pages, binaries, video, executables and user-generated content — mostly unstructured data that’s unlikely to change. But object storage is also the perfect place for backups – whether for disaster recovery or long-term retention for compliance.
In an object storage system, objects can reside on any number of servers, whether on premises or in the cloud. Instead of using a namespace and a directory structure, applications address objects based on an ID and a few simple HTTP API calls like PUT, UPLOAD, GET and DELETE.
What object storage lacks in versatility it makes up for in simplicity. Without the extensive overhead of file and block storage, applications like backup software can use simple requests to store and retrieve objects across large and distributed storage systems. This makes cloud object storage ideal for backup and long-term storage.
Why use cloud tiers?
Today’s cloud providers have recognized that not all cloud object storage systems serve the same purpose. That’s exactly why they offer different storage levels at varying costs based in part on the frequency at which data is accessed and how much is moved back and forth. These tiers are generally referred to as hot, warm, cool and cold, with hot typically having the fastest access to files and cold having the slowest although there are other factors involved.
Hot tiers are meant to store items that are needed for active work and must be accessed quickly and frequently. Amazon S3, Azure Blob, Wasabi, Google and IBM all offer hot tiers that are optimized for speed and performance.
Cold storage tiers are optimized for low cost archival storage – the kind of data that must be retained, but rarely accessed. AWS Glacier and Glacier Deep Archive and Azure Cool Blob are designed for massive scale and to house unstructured data for long periods of time with minimal movement.
The goal is to choose your tiers carefully. The less often you move or retrieve your data, the less it costs you to store it.
What are the pros and cons of cloud object storage?
As mentioned, the key advantages to cloud object storage are predominantly around its low cost and simple file address structure. The use of flat address space eliminates the complexity of hierarchies, so it becomes highly scalable. Object storage also does not require the use of specialty hardware, enabling scalability through the use of low-cost, commodity storage devices. Finally, object storage includes a richer set of metadata than traditional file systems, which better lends itself to analytics.
On the downside, many applications are not as well suited to object storage as they are with file system-based approaches. Unlike block storage, object storage doesn’t let you edit one part of a file. Objects are considered complete units and can only be viewed, updated, and rewritten as entire objects. Depending on object size, this can negatively affect performance. Similarly, cloud object storage systems contend with latency, which makes them unsuitable as a back-end for transactional systems, like databases.
Why use deduplication for cloud object storage?
Deduplication technology is key to balancing the cost equation of long-term storage. Deduplication uses algorithms to scan the data and remove any elements that have already been stored, replacing them with a pointer to similar, backed-up data.
Specifically, source-side deduplication combined with compression is the most effective way to reduce the size of data to be backed up before it goes to storage. That can significantly speed up the movement of data and increase throughput.
In fact, combining compression and deduplication with a hot or cold cloud tier can deliver the best of both worlds: lower pricing and better performance.
Three tips for moving forward with object storage
Before diving into a long-term cloud object storage project, consider these three points:
Protect all your systems, applications and data.
Use cloud-connected data protection software. The right application for backup/restore, disaster recovery and long-term data retention tasks should be smart enough to put data in the most advantageous cloud tiers. Then, when you need that data, the software can find and retrieve it automatically and transparently, no matter where it is.
Don’t forget data security. Securing your data in the cloud is your responsibility, not the cloud provider’s. Think about how you’ll encrypt the data in transit and at rest, and how your long-term cloud object storage will comply with global data privacy regulations.
Incorporate source-side data compression and deduplication. These technologies will significantly reduce the total volume of data you store on-premises and send to the cloud, while also minimizing the traffic back and forth. Remember, cloud storage tier pricing is a complex combination of how much you store and how often you access it.
Ultimately, using cloud object storage in the appropriate tier can be an extremely cost-effective option for backup and long-term data retention to meet compliance requirements. Having a good understanding of your needs and employing the right backup tools for the job can minimize your risk and reduce your cloud storage costs.