🗃️

Distributed Priority Queue Design

Jun 9, 2025

Overview

This lecture explains how to design a distributed priority queue for large-scale systems, highlighting practical data structures, partitioning, replication, and scalability concerns.

Introduction to Distributed Priority Queues

  • Distributed priority queues are needed for scalable, fault-tolerant, and prioritized asynchronous job processing at companies like Facebook or Google.
  • Each job is assigned a priority so important tasks can be processed faster.

Problem Requirements & Functionalities

  • The system should maintain an ordered list with assigned priorities.
  • Core operations: enqueue, modify, dequeue, and ensure at-least-once processing.
  • System must be scalable and fault-tolerant, handling billions of messages.

Ordered Consumption & Approximations

  • Perfect ordering of message consumption is only possible with one consumer; multiple consumers lead to approximate ordering.
  • Race conditions can cause later messages to be processed before earlier ones.

Data Structures & Storage Approaches

  • In-memory heaps offer efficient top-priority access, but are hard to scale on disk due to random access.
  • Disk-based B-Trees or LSM trees maintain sorted order but have slower read/write due to disk limitations.
  • Hybrid solution: store raw data/appends on disk (O(1) writes), maintain sorted index or heap (IDs + priorities) in memory.
  • Modifying priorities requires in-memory updates; access can be optimized using sorted lists + hashmaps for constant-time lookup.

Replication Strategies

  • Single-leader replication: fast read/write but risk of data loss on leader crash before replication.
  • Multi-leader replication: resolves write conflicts but less useful for asynchronous tasks.
  • Leaderless replication with quorum consistency ensures strong fault tolerance.
  • Use consensus protocols (e.g., Paxos, Raft) for synchronous message replication and reliability.

Partitioning & Scalability

  • Partitioning distributes read/write load across nodes to support scalability.
  • Round-robin assignment of messages to partitions is preferred for fairness over partitioning by priority.
  • Consumers may read from multiple partitions and aggregate top priorities across them via local in-memory heaps.

Consumer Design

  • Consumers may fetch and process jobs from multiple partitions using long polling to minimize idle connections.
  • Locking within partitions prevents duplicate work when multiple consumers share the same partition.
  • Events are marked as processed/deleted after acknowledgment by consumers.

Key Terms & Definitions

  • Priority Queue — A data structure where each element has a priority; higher priority elements are served before lower ones.
  • Heap — A tree-based structure for efficiently retrieving the highest (or lowest) priority item.
  • B-Tree/LSM Tree — Disk-based sorted index structures supporting efficient range queries and insertions.
  • Partitioning — Dividing data among nodes or shards to spread load and increase throughput.
  • Replication — Copying data across nodes to ensure reliability and fault tolerance.
  • Quorum Consistency — Requires a majority of nodes to agree on updates for strong consistency.
  • Consensus Algorithm (e.g., Paxos, Raft) — Protocols used to achieve agreement on system state among distributed nodes.

Action Items / Next Steps

  • Review consensus protocols (Paxos, Raft) for replication reliability.
  • Study in-memory and disk-based index structures for priority queues.
  • Understand partitioning strategies and their impact on load fairness and system scalability.