Coconote
AI notes
AI voice & video notes
Try for free
💻
Lecture Notes on Sharding and Partitioning in Databases
Jul 28, 2024
Notes on Sharding and Partitioning for Database Scaling
Key Concepts
Sharding
: Distributing data across multiple machines (at the database level).
Partitioning
: Splitting data into smaller pieces (at the data level).
Both techniques are used to improve throughput and availability in database systems.
Course Introduction
Code-based course on system design since March 2021.
Encourages collaborative learning through problem statements and brainstorming sessions.
Aims to cover trade-offs in design decisions.
Course highlights:
Week 1
: Core foundations and design of online/offline indicators.
Week 2
: Database logging and scalability examples.
Week 3
: Distributed systems, such as designing load balancers.
Week 4
: Designing social networks.
Week 5
: Building storage engines.
Week 6
: High throughput systems.
Week 7
: Information retrieval systems and designing message brokers and task schedulers.
Week 8
: Advanced algorithms for system efficacy.
Understanding Scaling in Databases
Initial Setup
Begin with a small server and MySQL database.
Serve a limited number of writes per second (e.g., 100 writes/sec).
Vertical Scaling
Increase server resource capacity (CPU, RAM, Disk) as traffic grows (e.g., to 200 writes/sec).
Introduce read replicas to handle increased reads without affecting writes.
Limitations of Vertical Scaling
Hardware limitations cap performance (e.g., unable to exceed 1000 writes/sec).
Horizontal Scaling
Distributing load across multiple servers (sharding) to handle higher throughput (e.g., 1500 writes/sec).
Each shard can handle a portion of the data, allowing for parallel processing of requests.
Sharding vs. Partitioning
Sharding
: Refers to distributing the database across multiple servers.
Partitioning
: Refers to splitting data into segments.
Example: Managing Large Data Sets
Split 100 GB data into partitions (e.g., 30 GB, 20 GB) to handle data more efficiently.
Partitions can reside on different shards or the same shard.
Logical partitions enhance data management and allow for load balancing.
Strategies for Partitioning
Horizontal Partitioning
: Split data by rows/documents.
Vertical Partitioning
: Split data by columns/tables.
Deterministic strategies are vital for effective partitioning based on access patterns and load.
Visualization of Sharding and Partitioning
No Sharding/No Partitioning
: Basic local database setup.
Partitioned with No Sharding
: Logical partitioning on a single database server.
Sharded with No Partitioning
: Read replicas with identical data.
Sharded and Partitioned
: Both strategies applied for efficient data handling across multiple servers.
Advantages of Sharding and Partitioning
Improved Throughput
: More servers means handling more reads/writes.
Increased Storage Capacity
: Storing large amounts of data across multiple servers.
High Availability
: Backup servers can take over when one goes down.
Disadvantages of Sharding and Partitioning
Operational Complexity
: Management of multiple servers can be complicated.
Cross-Shard Queries
: Joining tables across shards can be slow and resource-intensive, affecting throughput and increasing latency.
Conclusion
Understanding these concepts is crucial for practical applications in real-world systems.
The right approach combines both sharding and partitioning to achieve scalable and efficient database architectures.
Call to Action
If enjoyed the content, give feedback and subscribe for more in-depth engineering videos!
📄
Full transcript