Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung, Google
Abstract
GFS is a scalable, distributed file system designed for large, data-intensive applications.
Provides fault tolerance on inexpensive hardware with high aggregate performance.
Redesign driven by application workloads and technological environment.
Deployed widely at Google for data generation, processing, and research.
Design Motivations
Component Failures: Expected to be frequent; fault tolerance and recovery are critical.
Large Files: Multi-GB files are common; traditional block sizes and I/O operations reconsidered.
Mutation Patterns: Files are mostly appended, rarely overwritten; focus on append performance.
Application Co-design: Relaxed consistency model and atomic append operations.
Cluster Scale: Large clusters with thousands of nodes and terabytes of storage.
Design Overview
Assumptions
Built from inexpensive components; must handle frequent failures.
Stores large files; efficiency in managing multi-GB files.
Two types of reads: large streaming reads and small random reads.
Large sequential writes dominate; small writes are less efficient.
Atomicity for concurrent appends by multiple clients.
High sustained bandwidth prioritized over low latency.
Interface
Similar to traditional file systems, but with additional operations like snapshot and record append.
Architecture
Single master, multiple chunkservers, multiple clients.
Files divided into fixed-size chunks stored on multiple chunkservers.
Metadata stored by the master; clients interact directly with chunkservers for data.
Metadata Management
Three types: namespace, file-to-chunk mapping, and chunk locations.
In-memory storage for fast operations; periodic metadata persistence.
Consistency Model
Relaxed consistency optimized for append operations.
Atomic file mutations handled by the master.
Affected regions are defined and consistent post-successful mutations.
System Interactions
Leases and Mutation Order
Mutations are ordered by a lease mechanism; primary replica assigns mutation orders.
Data Flow
Data pushed linearly to chunkservers to optimize network use.
Atomic Record Appends
Record append operation ensures atomicity even with concurrent clients.
Snapshots
Copy-on-write technique for quick, low-interruption snapshots.
Fault Tolerance and Diagnosis
High Availability
Fast recovery and replication strategies.
Data integrity ensured through checksums.
Diagnostic Tools
Extensive logging for monitoring and problem diagnosis.
Measurements
Performance metrics show GFS supporting high read/write rates under different workloads.
GFS efficiently handles large-scale, concurrent data operations.
Experiences and Challenges
Initial focus on production systems; evolved to support research and development.
Encountered issues with hardware and kernel-related failures.
File system designed to handle operational challenges with robust fault tolerance mechanisms.
Related Work
Compared to systems like AFS, xFS, and others, GFS offers unique features like large chunk sizes, relaxed consistency, and simplified design without caching.
Conclusion
GFS effectively supports large-scale data processing on commodity hardware.
Focus on component failures, file size, and append operations led to a unique and efficient system design.
Continues to play a crucial role in Google's data processing infrastructure.