Transcript for:
Storage Concepts: Block, File, and Object Storage

Hello, everyone, and welcome. My name is Mike Johnson, but you can call me MJ. I am one of the training content creators and a DevOps engineer here at MinIO, and in this video we will be covering some common storage concepts at a high level and discussing the differences between block, file and object storage and how you might use those in your organization. So come along with me on this journey and let's jump into the slides. First off, a little bit about me. My name is Mike Johnson, I’m a MinIO training content creator and Devsecops engineer. I have over 22 years in I.T., ten years in cloud administration, 20 years in sysadmin, been doing containers in Kubernetes for quite a while now. And I have multiple certs in AWS, Azure, Docker and Kubernetes. So what are we going to cover today? Today we're going to talk about what is block storage, what is file storage, and what is object storage. A lot of people don't really know how to differentiate between those as well as some common storage concepts you're going to see come up time and time again as we start looking at some of the performance and cost considerations around each of these different storage types, we're also going to look at some common use cases for each and then do a visual side by side comparison of them. So what are some of the common storage types that we're can cover today? Well, there's block and there's file and there's object. And there's some distinctions about each of these that make them unique. And therefore, the use cases for each of them quite unique. So let's talk about block storage basics. First off, blocks treat data as a sequence of fixed block sizes, and each file is spread across multiple blocks. BLOCK sizes can be adjusted to suit requirements of what's being stored. So you may have seen this if you're working with a certain type of database where the standard block size of that database outputs is maybe 16 KB but your standard disk block size is four. You can optimize that by matching the results and getting more throughput on the same hardware. Blocks do not need to be stored together and can be arranged to provide the best performance. Now that is to say that the software is going to arrange it. You're never going to arrange those blocks yourself. This is all done under the hood. You don't ever need to worry about that. There is, however, limited capacity to handle metadata, so you get things like file name and a couple other things, but there's not really any searchable metadata in here and that's going to become important later. It has strong consistency and is highly structured at the block level, which means that if you're writing to a single target device and there's nothing wrong with that device, that that data is very unlikely to become corrupt in and of itself. Now, if you lose that single storage device, obviously that data is going to be lost. But block storage itself is pretty resilient. And the way that block storage is retrieved is it retrieved as blocks using common methods such as iSCSI, fiber, serial, SATA and other things, and then reassembled to be whatever file that you wrote or that is, you're never going to read it as blocks, but rather you're going to get it back as the thing you intended to write a file or something like that. Next, let's talk about some of the basics of file storage. So files are stored as a whole and accessed in the original format that they were stored in the folder file path manner. So you've probably seen this before where you are at a corporate environment and you're using the let's say it's the I drive. And on the I drive has all of the corporate files. When you write that file up, it goes across as a maybe PDF and it comes back as a PDF and you search it through, you know, corporate directory, slash, you know, slash secret file, slash that PDF that you're looking for. There's limited metadata stored in here as well. So there's create date, modify date, file size and a few other attributes. But again, it's not entirely searchable based on the metadata alone. Files can be locked, however, to a single writer to prevent corruption. And once the operation is complete, another writer can modify that file. So let's say that you're using this file storage as a backing for writing a database. That file can be locked so that another writer cannot write to that same database at the same time. And therefore, you're preventing some sort of data corruption where two different devices or end users are trying to write to the same file at the same time and potentially overlap each other's data. Note that file storage typically sits atop block and object storage and it's access via specific protocols like SMB or NFS, so you start thinking about these additional protocols as being overhead and we'll talk about that in the performance section later. So let's talk about some of the basics of object storage. In object storage files are stored on distributed shards with metadata, object ID and additional attributes. So things like. RBAC So who's allowed to do what to those files, whether they're allowed to read or open them or download them or those kind of things and reassembled upon request, unlimited metadata can be attached, which allows for even more advanced search capabilities. Astra's there and limited metadata can be attached in MinIO. A lot of the other S3 or object storage providers do limit the amount of metadata that you can attach to a file. Files cannot be locked, but versioning can be turned on to retain data integrity and meet some thing like regulatory requirements where you can't have that file change without there being a record of that change and being able to roll back to that previous state of the file. There's also this concept of unlimited scalability in object storage, where you just keep adding back ends regardless of where they are in the front end interface sort of takes care of all of that and obfuscates that away from the end user. So to the end user, it doesn't seem like you've made any changes or had to do anything else. But on the back end you're just continuing to add common off the shelf hardware to extend this cluster and really extend it out as far as you need it to go. And this is accessed over the rest API using common HTTP protocol, which means that it's pretty ubiquitous. And where it can be used, it can be used from client workstations or servers or cloud nodes and anything like that, application storage and all those kind of things. So it's a really powerful tool and a great way to store a lot of data very efficiently.