Hello, everyone, and welcome. My name is Mike Johnson,
but you can call me MJ. I am one of the training content creators
and a DevOps engineer here at MinIO, and in this video we will be covering some
common storage concepts at a high level and discussing the differences
between block, file and object storage and how you might use those
in your organization. So come along with me on this journey
and let's jump into the slides. First off, a little bit about me. My name is Mike Johnson, I’m a MinIO training
content creator and Devsecops engineer. I have over 22 years in I.T., ten years
in cloud administration, 20 years in sysadmin, been doing containers
in Kubernetes for quite a while now. And I have multiple certs in
AWS, Azure, Docker and Kubernetes. So what are we going to cover today? Today we're going to talk about
what is block storage, what is file storage,
and what is object storage. A lot of people don't really know how to differentiate
between those as well as some common storage concepts you're going to see
come up time and time again as we start looking at some of the performance
and cost considerations around each of these different storage
types, we're also going to look at some common
use cases for each and then do a visual side
by side comparison of them. So what are some of the common
storage types that we're can cover today? Well, there's block and there's file
and there's object. And there's some distinctions
about each of these that make them unique. And therefore, the use cases
for each of them quite unique. So let's talk about block storage basics. First off, blocks
treat data as a sequence of fixed block sizes, and each file is spread
across multiple blocks. BLOCK sizes can be adjusted to suit
requirements of what's being stored. So you may have seen this if you're
working with a certain type of database where the standard block
size of that database outputs is maybe 16 KB
but your standard disk block size is four. You can optimize that
by matching the results and getting more throughput on the same
hardware. Blocks do not need to be stored together
and can be arranged to provide the best performance. Now that is to say that the software
is going to arrange it. You're never going to arrange
those blocks yourself. This is all done under the hood. You don't ever need to worry about that. There is, however, limited capacity to handle metadata,
so you get things like file name and a couple other things, but there's not really
any searchable metadata in here and that's going to become
important later. It has strong consistency
and is highly structured at the block level, which means that if you're writing
to a single target device and there's nothing wrong
with that device, that that data is very unlikely
to become corrupt in and of itself. Now, if you lose that single storage device,
obviously that data is going to be lost. But block storage itself
is pretty resilient. And the way that block
storage is retrieved is it retrieved as blocks using common methods
such as iSCSI, fiber, serial, SATA and other things,
and then reassembled to be whatever file that you wrote
or that is, you're never going to read it as blocks,
but rather you're going to get it back as the thing you intended
to write a file or something like that. Next, let's talk
about some of the basics of file storage. So files are stored as a whole
and accessed in the original format that they were stored in the folder file
path manner. So you've probably seen this before
where you are at a corporate environment and you're using the let's
say it's the I drive. And on
the I drive has all of the corporate files. When you write that file up,
it goes across as a maybe PDF and it comes back as a PDF
and you search it through, you know, corporate directory, slash, you know, slash secret file,
slash that PDF that you're looking for. There's limited
metadata stored in here as well. So there's create date, modify date,
file size and a few other attributes. But again, it's not entirely searchable
based on the metadata alone. Files can be locked, however,
to a single writer to prevent corruption. And once the operation is complete,
another writer can modify that file. So let's say that you're using this file storage
as a backing for writing a database. That file can be locked so that another writer cannot write
to that same database at the same time. And therefore, you're preventing
some sort of data corruption where two different devices
or end users are trying to write to the same file at the same time
and potentially overlap each other's data. Note that file storage typically sits atop
block and object storage and it's access
via specific protocols like SMB or NFS, so you start thinking about these
additional protocols as being overhead and we'll talk about that
in the performance section later. So let's talk about some of the basics
of object storage. In object storage files are stored
on distributed shards with metadata, object ID and additional attributes. So things like. RBAC So who's allowed to do what
to those files, whether they're allowed to read
or open them or download them or those kind of things
and reassembled upon request, unlimited metadata can be attached, which allows for
even more advanced search capabilities. Astra's there and limited
metadata can be attached in MinIO. A lot of the other S3 or object
storage providers do limit the amount of metadata
that you can attach to a file. Files cannot be locked, but versioning
can be turned on to retain data integrity and meet some thing like regulatory
requirements where you can't have that file change without there
being a record of that change and being able to roll back
to that previous state of the file. There's also this concept
of unlimited scalability in object storage,
where you just keep adding back ends regardless of where they are in the front
end interface sort of takes care of all of that and obfuscates
that away from the end user. So to the end user, it doesn't seem like
you've made any changes or had to do anything else. But on the back end you're just continuing to add common off the shelf
hardware to extend this cluster and really extend it out
as far as you need it to go. And this is accessed over the rest API using common HTTP protocol,
which means that it's pretty ubiquitous. And where it can be used,
it can be used from client workstations or servers or cloud nodes
and anything like that, application storage
and all those kind of things. So it's a really powerful tool
and a great way to store a lot of data very efficiently.