What is object storage? How is it different from file and block storage? What are the pros and cons? The answers to these questions and more will be explained in this blog post.
What is object storage? And how is it different from file and block storage?
To better understand object storage, it helps to first understand that there are three primary types of data storage. Let’s take a quick look at each type.
File storage works similarly to how Microsoft Windows stores files within folders in a hierarchical tree structure. And in turn, the Windows file structure was designed to resemble how we store physical files in folders within a file cabinet which could be within a warehouse full of file cabinets. You put files inside of folders and when you need to find a file, you navigate the file structure or search for it.
While file storage provides an easy way to navigate files and folders, it adds a lot of overhead for file access privileges, locking, copying and other file operations. File storage systems can scale fairly well to accommodate hundreds of thousands of files, but not the billions of files and petabytes of data that many organizations have in their backup repositories.
Block storage breaks down files into distinct blocks of data, which are then stored separately. Because each block of data has a unique address, a block storage system doesn’t require a hierarchical tree structure like file storage systems have. This permits a block storage system to distribute smaller chunks of data to the most efficient location inside the repository. When a file is accessed, the storage system software reassembles the necessary blocks to reconstruct the file.
Block storage is well suited for databases and transactional systems. But block storage can be expensive, it must be connected to a server and its limited metadata can slow down the search and retrieval of data.
Object storage is an architecture for data storage that manages data as objects, which is a file with customized metadata and a unique identifier, rather than as files nested within a file and folder structure or as blocks within sectors and tracks.
Object storage is somewhat similar to block storage in that each object has a unique identifier, but it enables the addition of more metadata that can be customized with detailed information about the files stored in each object. This enhanced metadata makes it easier for searchers to find what they are looking for and it also accelerates the retrieval process.
What are the use cases for object storage?
Object storage, also called object-based storage, is not new. It’s been around quite a while. Its use originated primarily in supercomputers but grew to market prominence around 20 years ago with its use in on-premises archiving systems. Object storage was deemed to be a great fit for backup and recovery, as well as long-term data retention and disaster recovery, due to its immutability, limitless scalability and low cost.
And interestingly, object storage in the cloud has become the methodology of choice for most cloud-based services. It is used in many popular online consumer services like Facebook, Spotify and Dropbox and forms the foundation for commercial cloud services like Amazon Simple Storage Service (S3), Microsoft Azure Blob storage and Google Cloud storage.
In addition to archiving, object storage is well suited for large sets of unstructured data like images, sound and video files as well as log files and Internet of Things (IoT) sensor data whether they are housed on-premises or in the cloud.
How scalable is it?
Returning to our warehouse metaphor, picture your files sitting in a warehouse. With file storage, eventually you get to a point where you fill all the folders, file cabinets and space within the warehouse until you don’t have any room to continue growing. At this point, you’d have to go and build another warehouse and start filling it up with files, folders and file cabinets.
A good metaphor for object storage is a bucket, which is the terminology used by the Amazon S3. With object storage, you put all your information into this bucket. And if you consider a warehouse designed for object storage, you replace the file structure, folders and file cabinets with buckets.
If you then compare the file-based and object-based warehouses, the object storage warehouse has the same kind of outer boundaries as the one with file storage, but it doesn’t have a roof. Your warehouse can continue scaling and adding buckets as your data continues to grow. And that’s one of the main considerations for object storage.
How searchable is it?
Object storage can make it easier and faster for you to find and retrieve data. This is possible mostly because in addition to having the unique identifiers for each object, object storage enables you to add detailed and customized metadata to your objects. You can put in all sorts of information about the files in the metadata.
When you use object storage, you have three methods for searching for files. You can search on the metadata that you have put in, or the metadata that’s been pulled out of the file and populated, or you could search for the unique identifier. Finding what you are looking for in a large amount of data becomes a lot easier and faster. And when we’re talking about petabytes of data, and we have customers with tens and hundreds of petabytes worth of data, finding what you are searching for quickly becomes significantly less of a challenge than with other methods.
What are the limitations?
Object storage, unlike file or block storage, does not allow you to edit only one portion of a file. Objects are treated as complete units that can only be viewed or written as a whole. Editing files requires the creation of new objects. While this is beneficial for long-term retention and compliance, it makes it unsuitable as a back-end for transactional systems such as databases and it can grow your repositories much faster than file or block storage if you’re frequently editing files.
How can you maximize the value of object storage in the cloud?
If you are using, or intend to use, object storage in the cloud, there are a couple of ways you can maximize the value it produces for your organization. Data deduplication and compression are the primary methods for reducing the quantity of backup data stored, and those methods, along with cloud tiering, can produce substantial cost savings in the cloud.
Data deduplication uses algorithms to scan data as it is ingested and if it finds any data that matches data already in the repository, it replaces it with a pointer to the already stored data. In a nutshell, it ensures that only unique data is stored. Data deduplication is typically measured in ratios. For instance, a 10:1 reduction ratio means that you’re reducing the storage capacity requirements by 90%.
Data compression supplements data deduplication by compacting data to take up less storage space. It removes any unneeded fillers and spaces in the data but retains all vital pieces of information.
Cloud tiering enables organizations to optimize their costs of object storage in the cloud based on their use of that data. Most cloud storage providers offer different storage levels at varying costs based on how often the data is accessed and how much is moved back and forth. While there are some other factors involved, these levels are typically categorized as hot, warm, cool and cold, with hot providing the quickest access to files and cold having the slowest access. The goal is to choose the slowest tier that still yields the performance needed.
Combining data deduplication, compression and cloud tiering can enable you to drive down storage costs while accelerating the access to your repository of backup data. Deduplicating and compressing your data is the most effective way to reduce the size of your data to be backed up before it goes into storage. And when you shrink your storage requirements, you open up the possibility of selecting a faster tier of cloud storage while still saving costs.
For more information on optimizing the use of object storage in the cloud, reference our white paper “Cloud tiering and object storage for backup —balancing cost and speed.”
Organizations have been growing increasingly interested in using object storage in the cloud for data backup and you can see why. Object storage is perfect for backup data as it can scale indefinitely, is incredibly cost-effective, works practically anywhere, and isn’t limited by size or format.