Transcript for:
Exploring Apache Hadoop Ozone Architecture

hi I would like to discuss a few interesting technical details of Apache head of ozone but before we can talk about the technical details we should be on the same side on high level about Apache Hadoop ozone so in this video with you I would like to talk about the main components of awesome awesome is Hadoop sub-project it's a distributed object store and it can scale and store billions of objects and provides s3 protocol Hadoop file system interface and CSI if we talk about the storage there are multiple approaches to to store files let's say you have one file and one very common way to store it which is followed by HDFS and ozone is just to split this one file to smaller blocks and in the case we should store the blocks on the files so there are multiple advantages of this approach is we can we can copy just the blocks between data nodes it easier to erase your code so let's use this pattern right now if we have these blocks which are just the parts of the file one the next question is that what what how can we replicate the blocks between data nodes so block one should be cabbage data node one idly not just two data node 1 by data node 3 we need multiple instances from the same blocks right just to avoid any kind of data loss so let's copy the b32 data node one day - no - and maybe two data node 4 and in Bay 2 we have just one instance so it also should be copied to data node maybe today to not do so that's the storage model which is used by HDFS and house 1s let's use a dn-4 replica for the block ones so we have to kind of map banks actually one mapping is that the file is mapped to blocks and the other one where the blocks are mapped to real locations and these two mappings can be named as key space and blog space and actually in HDFS we have the same mapping or same kind of mappings but in HDFS we have one master node named node which manages both of these maps but in Ozone there is a different structure because we just cut this name node these two functionalities are separated to two master servers so have one specific service which does the first mapping the key to block mapping the key to block mapping and the other one does only the replication part that binaries should be replicated and stored somewhere in safe way so this is the full view of ozone components all of the other components are just additional helpers like the web UI or prediction service or an s3 compatible rest service which uses the main must main to master services and the three data nodes so one advantage of this approach that on top of this lower level layer which does replication between the data also the only responsibility of this storage container manager just copy huge binary blocks and on top of it we can provide an object store so we started objects store that is the ozone but we can provide additional services may be a mountable storage or may be HDFS can use this lowers the service that was the original vision but currently this is what we have we have a lower level block space management layer with all of the data nodes who report back to the storage container Ranger which manages this replication and we have the auto manager which manages this t-value volumes space and on top of it we have all of the other services the clients with different protocols so if we check the storage container manager the SCM which is responsible we which has the responsibility to replicate the data because we can find these Network protocols so pipeline is just some kind of replication group multiple servers and containers our replication block a huge binary which is replicated between multiple data nodes so these are the services which are provided by this low-level SCM service and it is one additional that there is a and heartbeat from the data nodes to the sium but if we check the also monitor this is an another layer which uses the lower level replication layer but the only responsibility of this layer is the key space management it manages volumes buckets keys the real business objects with the help of the SCM and it also provides some kind of indexes to to provide better services for file system based clients like the header compatible file system some security related networks so these are the main network services inside awesome manager and last but not least we have the data nodes button they are very similar to any kind of storage nodes in any other storage service so we have the ozone manager for the key space management the storage container manager for growth space management those are the master services and the data nodes are just storing the data and reporting back the current status to the SDM via heart rates so that's about epigenome in the next video we will check the storage container manager in more details and check what are the huge binaries which are replicated between the different services