Hashing Lecture Notes

Jul 16, 2024

Hashing Lecture Notes

Introduction

  • Topic: Hashing
  • Previous lecture: Set data structures (storing items by key, unsorted array, looking up in log n time).
  • Today: Discussing the set interface (dynamic find, insert, delete) and hashing for faster operations.

Static Find

  • Aim: Can we do faster than log n?
  • Introduced new model: Comparison Model (only comparisons between keys are allowed).
  • Comparison Sort Algorithms:
    • Based on key comparisons (equal, less than, greater than).
    • Represented using a Decision Tree, where internal nodes are comparisons and leaves are outputs.
  • Goal: Minimum height of binary tree with n+1 leaves (output n items or not found) is log(n).

Direct Access Arrays

  • Store item in an array at the index of its key.
  • Performance: Constant time for find, insert, delete.
  • Problem: Prohibitively large space requirement if key range is large (e.g., MIT IDs).
  • To overcome: Use more space-efficient structures.

Hashing

  • Concept: Map a large key space to a smaller array using a hash function.
  • Function: h(key) maps key to an array index (0 to m-1).
  • Problem: Collisions (multiple keys mapping to the same index).
  • Solutions:
    • Open Addressing: Find another place in the array (complicated).
    • Chaining: Store a list or other data structure at each array index that manages collisions.

Choosing a Hash Function

  • Division Method: h(k) = k mod m.
    • Works well if keys are uniformly distributed.
  • Universal Hash Function: More generalized approach to ensure better performance.
    • Function: h_a,b(k) = ((a * k + b) mod p) mod m (with random a and b).
    • Ensures key distribution with less probability of clustering.
  • Universality: Ensures probability of two keys colliding is ≤ 1/m.
    • Implication: Expected chain length is constant (1 + (n-1)/m) if m ≈ n.

Making Hash Tables Dynamic

  • As the number of stored keys (n) grows, increase m dynamically.
  • Rebuild: Recreate hash table with larger m to maintain efficiency, similar to dynamic arrays.

Conclusion

  • Hashing allows efficient dynamic set operations even when keys are large integers.
  • Next lecture will focus on faster sorting techniques.