Distributed Priority Queue Design

Overview

This lecture explains how to design a distributed priority queue for large-scale systems, highlighting practical data structures, partitioning, replication, and scalability concerns.

Introduction to Distributed Priority Queues

Distributed priority queues are needed for scalable, fault-tolerant, and prioritized asynchronous job processing at companies like Facebook or Google.
Each job is assigned a priority so important tasks can be processed faster.

Problem Requirements & Functionalities

The system should maintain an ordered list with assigned priorities.
Core operations: enqueue, modify, dequeue, and ensure at-least-once processing.
System must be scalable and fault-tolerant, handling billions of messages.

Ordered Consumption & Approximations

Perfect ordering of message consumption is only possible with one consumer; multiple consumers lead to approximate ordering.
Race conditions can cause later messages to be processed before earlier ones.

Data Structures & Storage Approaches

In-memory heaps offer efficient top-priority access, but are hard to scale on disk due to random access.
Disk-based B-Trees or LSM trees maintain sorted order but have slower read/write due to disk limitations.
Hybrid solution: store raw data/appends on disk (O(1) writes), maintain sorted index or heap (IDs + priorities) in memory.
Modifying priorities requires in-memory updates; access can be optimized using sorted lists + hashmaps for constant-time lookup.

Replication Strategies

Single-leader replication: fast read/write but risk of data loss on leader crash before replication.
Multi-leader replication: resolves write conflicts but less useful for asynchronous tasks.
Leaderless replication with quorum consistency ensures strong fault tolerance.
Use consensus protocols (e.g., Paxos, Raft) for synchronous message replication and reliability.

Partitioning & Scalability

Partitioning distributes read/write load across nodes to support scalability.
Round-robin assignment of messages to partitions is preferred for fairness over partitioning by priority.
Consumers may read from multiple partitions and aggregate top priorities across them via local in-memory heaps.

Consumer Design

Consumers may fetch and process jobs from multiple partitions using long polling to minimize idle connections.
Locking within partitions prevents duplicate work when multiple consumers share the same partition.
Events are marked as processed/deleted after acknowledgment by consumers.

Key Terms & Definitions

Priority Queue — A data structure where each element has a priority; higher priority elements are served before lower ones.
Heap — A tree-based structure for efficiently retrieving the highest (or lowest) priority item.
B-Tree/LSM Tree — Disk-based sorted index structures supporting efficient range queries and insertions.
Partitioning — Dividing data among nodes or shards to spread load and increase throughput.
Replication — Copying data across nodes to ensure reliability and fault tolerance.
Quorum Consistency — Requires a majority of nodes to agree on updates for strong consistency.
Consensus Algorithm (e.g., Paxos, Raft) — Protocols used to achieve agreement on system state among distributed nodes.

Action Items / Next Steps

Review consensus protocols (Paxos, Raft) for replication reliability.
Study in-memory and disk-based index structures for priority queues.
Understand partitioning strategies and their impact on load fairness and system scalability.