🔑

Basic Hashing in DSA

Jul 14, 2024

Stripers A to Z DSA Course: Basic Hashing

Introduction

  • Most in-depth Data Structures and Algorithms (DSA) course in India
  • Previous lectures covered till step 1.5
  • Today’s lecture focuses on Basic Hashing
  • Importance of hashing in DSA for solving a variety of problems
  • Language used: Pseudocode for universal understanding

Coding Round Preparation

  • Importance of practice: Mention of Go Rush X contest by Newton School
    • India’s biggest competitive programming contest
    • Prizes worth 10 lakh rupees
    • Contest on January 28, involving 8 questions to be solved in 3 hours

Understanding Hashing

  • Hashing definition: A technique to pre-store and fetch required data efficiently.
  • Example problem: Counting occurrences of elements in an array.
  • Brute force approach: Iterating through the array multiple times (Time complexity: O(n) for each query)

Efficient Approach Using Hashing

  1. Create a hash array to store frequencies.
  2. Initialize a hash array to zeros.
  3. Iterate through the array and update hash values.
  4. Queries can then be answered in O(1) time complexity using the precomputed hash array.
  • Example pseudocode provided for better understanding.

Hashing with Constraints

  • For large maximum values (e.g., 10^9), creating an array is impractical.
  • Global declaration of arrays to handle up to 10^7 elements.

Character Hashing

  • String hashing: Counting occurrences of each character in a string.
  • Use an array or map for character hashing.
  • Pseudocode: Use ASCII values for indexing.
    • Example: hash[c - 'a']++ for lowercase letters
  • For mixed-case: Use an array of size 256 (for ASCII)

Limitations and Solutions

  • When elements are very large, like 10^12 or more, arrays cannot be used effectively.
  • Use maps and unordered maps (C++ STL and Java Collections).
  • Map: Stores elements in sorted order. Time complexity for insertion and access is O(log n).
  • Unordered Map: Average case O(1) for insertion and access, worst-case O(n).

Internal Hashing Mechanisms

  • Division Method: Simple and explained using modulo operation.
  • Collisions: Explained with examples and concepts like linear chaining.
  • Handling Collisions: Linear chaining to store multiple values at the same hash index.

Practical Considerations

  • In C++, unordered_map usually preferred unless worst-case performance hits.
  • Global declaration for large arrays
  • Variety of keys supported in map and unordered_map. Limited key types in unordered_map (no complex keys allowed).

Homework and Next Steps

  • Task: Find the highest and lowest frequency of elements using hashing techniques.
  • Mention of upcoming module steps in the next lecture.

Conclusion

  • Encouragement to type 'understood' in comments
  • Reminder to like and subscribe for more content
  • Follow on social media for updates