Coconote
AI notes
AI voice & video notes
Export note
Try for free
Exploring Apache Hadoop Ozone Architecture
Aug 22, 2024
🤓
Take quiz
Notes on Apache Hadoop Ozone
Overview of Apache Hadoop Ozone
Ozone is a sub-project of Hadoop.
It's a distributed object store that can scale to store billions of objects.
Provides S3 protocol, Hadoop File System interface, and CSI (Container Storage Interface).
Storage Approaches
Common method: Split files into smaller blocks (similar to HDFS).
Advantages:
Easier to replicate just the blocks between data nodes.
More efficient erasure coding.
Block Replication
Need multiple instances of blocks across data nodes to prevent data loss.
Example:
Block 1 should be on Data Node 1, Data Node 3, etc.
Block replication management ensures availability.
Mapping Structures
Key Space Mapping
: Mapping files to blocks.
Block Space Mapping
: Mapping blocks to their storage locations.
In HDFS, a single master node (NameNode) manages both mappings.
In Ozone, this functionality is split across two master servers:
Key to Block Mapping Service
Replication Management Service
Ozone Components
Core components include:
Storage Container Manager (SCM) - for block replication.
Ozone Manager - for key space management.
Data Nodes - responsible for storing data.
Additional components: Web UI, prediction service, S3 compatible REST service.
Storage Container Manager (SCM)
Responsible for replicating data.
Uses network protocols to create pipelines for replication.
Heartbeat mechanism from Data Nodes to SCM to report status.
Ozone Manager
Manages volumes, buckets, keys, and provides indexing for file system clients.
Ensures integration with the lower-level replication layer (SCM).
Data Nodes
Function similarly to storage nodes in other services.
Responsible for reporting data status to SCM.
Future Topics
Next discussions will focus on the Storage Container Manager in more detail and the management of large binary objects that are replicated.
📄
Full transcript