Overview
This lecture explains how to design a distributed priority queue for large-scale systems, highlighting practical data structures, partitioning, replication, and scalability concerns.
Introduction to Distributed Priority Queues
- Distributed priority queues are needed for scalable, fault-tolerant, and prioritized asynchronous job processing at companies like Facebook or Google.
- Each job is assigned a priority so important tasks can be processed faster.
Problem Requirements & Functionalities
- The system should maintain an ordered list with assigned priorities.
- Core operations: enqueue, modify, dequeue, and ensure at-least-once processing.
- System must be scalable and fault-tolerant, handling billions of messages.
Ordered Consumption & Approximations
- Perfect ordering of message consumption is only possible with one consumer; multiple consumers lead to approximate ordering.
- Race conditions can cause later messages to be processed before earlier ones.
Data Structures & Storage Approaches
- In-memory heaps offer efficient top-priority access, but are hard to scale on disk due to random access.
- Disk-based B-Trees or LSM trees maintain sorted order but have slower read/write due to disk limitations.
- Hybrid solution: store raw data/appends on disk (O(1) writes), maintain sorted index or heap (IDs + priorities) in memory.
- Modifying priorities requires in-memory updates; access can be optimized using sorted lists + hashmaps for constant-time lookup.
Replication Strategies
- Single-leader replication: fast read/write but risk of data loss on leader crash before replication.
- Multi-leader replication: resolves write conflicts but less useful for asynchronous tasks.
- Leaderless replication with quorum consistency ensures strong fault tolerance.
- Use consensus protocols (e.g., Paxos, Raft) for synchronous message replication and reliability.
Partitioning & Scalability
- Partitioning distributes read/write load across nodes to support scalability.
- Round-robin assignment of messages to partitions is preferred for fairness over partitioning by priority.
- Consumers may read from multiple partitions and aggregate top priorities across them via local in-memory heaps.
Consumer Design
- Consumers may fetch and process jobs from multiple partitions using long polling to minimize idle connections.
- Locking within partitions prevents duplicate work when multiple consumers share the same partition.
- Events are marked as processed/deleted after acknowledgment by consumers.
Key Terms & Definitions
- Priority Queue — A data structure where each element has a priority; higher priority elements are served before lower ones.
- Heap — A tree-based structure for efficiently retrieving the highest (or lowest) priority item.
- B-Tree/LSM Tree — Disk-based sorted index structures supporting efficient range queries and insertions.
- Partitioning — Dividing data among nodes or shards to spread load and increase throughput.
- Replication — Copying data across nodes to ensure reliability and fault tolerance.
- Quorum Consistency — Requires a majority of nodes to agree on updates for strong consistency.
- Consensus Algorithm (e.g., Paxos, Raft) — Protocols used to achieve agreement on system state among distributed nodes.
Action Items / Next Steps
- Review consensus protocols (Paxos, Raft) for replication reliability.
- Study in-memory and disk-based index structures for priority queues.
- Understand partitioning strategies and their impact on load fairness and system scalability.