🗄️

Overview of Google File System (GFS)

May 4, 2025

The Google File System (GFS)

Authors

Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung, Google

Abstract

GFS is a scalable, distributed file system designed for large, data-intensive applications.
Provides fault tolerance on inexpensive hardware with high aggregate performance.
Redesign driven by application workloads and technological environment.
Deployed widely at Google for data generation, processing, and research.

Design Motivations

Component Failures: Expected to be frequent; fault tolerance and recovery are critical.
Large Files: Multi-GB files are common; traditional block sizes and I/O operations reconsidered.
Mutation Patterns: Files are mostly appended, rarely overwritten; focus on append performance.
Application Co-design: Relaxed consistency model and atomic append operations.
Cluster Scale: Large clusters with thousands of nodes and terabytes of storage.

Design Overview

Assumptions

Built from inexpensive components; must handle frequent failures.
Stores large files; efficiency in managing multi-GB files.
Two types of reads: large streaming reads and small random reads.
Large sequential writes dominate; small writes are less efficient.
Atomicity for concurrent appends by multiple clients.
High sustained bandwidth prioritized over low latency.

Interface

Similar to traditional file systems, but with additional operations like snapshot and record append.

Architecture

Single master, multiple chunkservers, multiple clients.
Files divided into fixed-size chunks stored on multiple chunkservers.
Metadata stored by the master; clients interact directly with chunkservers for data.

Metadata Management

Three types: namespace, file-to-chunk mapping, and chunk locations.
In-memory storage for fast operations; periodic metadata persistence.

Consistency Model

Relaxed consistency optimized for append operations.
Atomic file mutations handled by the master.
Affected regions are defined and consistent post-successful mutations.

System Interactions

Leases and Mutation Order

Mutations are ordered by a lease mechanism; primary replica assigns mutation orders.

Data Flow

Data pushed linearly to chunkservers to optimize network use.

Atomic Record Appends

Record append operation ensures atomicity even with concurrent clients.

Snapshots

Copy-on-write technique for quick, low-interruption snapshots.

Fault Tolerance and Diagnosis

High Availability

Fast recovery and replication strategies.
Data integrity ensured through checksums.

Diagnostic Tools

Extensive logging for monitoring and problem diagnosis.

Measurements

Performance metrics show GFS supporting high read/write rates under different workloads.
GFS efficiently handles large-scale, concurrent data operations.

Experiences and Challenges

Initial focus on production systems; evolved to support research and development.
Encountered issues with hardware and kernel-related failures.
File system designed to handle operational challenges with robust fault tolerance mechanisms.

Related Work

Compared to systems like AFS, xFS, and others, GFS offers unique features like large chunk sizes, relaxed consistency, and simplified design without caching.

Conclusion

GFS effectively supports large-scale data processing on commodity hardware.
Focus on component failures, file size, and append operations led to a unique and efficient system design.
Continues to play a crucial role in Google's data processing infrastructure.

View note sourcehttps://drive.google.com/file/d/1dXkph2gmxM5f906Cx4PYXUpFYN-IcQSV/view?usp=drivesdk