Recreating Shazam's Recognition Algorithm

Mar 3, 2025

Lecture Notes: Recreating Shazam's Song Recognition Algorithm

Introduction

  • Motivation: Curiosity and desperation to land a junior Dev job.
  • Shazam's tech fascination: Instantly recognizes songs using audio.
  • Objective: Reverse engineer Shazam to demonstrate engineering skills.

Understanding Shazam's Technology

  • Core concept: Audio Fingerprinting
    • Creates a unique 'DNA profile' for every song.
  • Main Steps:
    1. Spectrogram Creation: Converts song to frequency content over time.
    2. Peak Identification: Pinpoints high-intensity frequencies or 'Peaks'.
    3. Hash Encoding: Encodes relationships into unique hashes linked to song metadata.
    4. Database Storage: Stores thousands of hashes as a song's fingerprint.

Process of Song Identification

  • App records a snippet and creates its fingerprint.
  • Uses fingerprints to query database for matches.
  • Evaluates time coherence for candidate songs to find the correct match.

Implementation

  • Language: Coded using Go, no libraries, for learning.
  • Audio handling: Utilized libraries for heavy lifting tasks.

Key Functions in Algorithm

  1. Spectrogram Creation:

    • Converts raw audio into spectrogram via sliding window technique.
    • Mitigates spectral leakage using Hamming window function.
    • Uses FFT to transform audio to frequency domain.
    • Downsamples audio, limits frequency range (20Hz to 5kHz).
  2. Peak Extraction:

    • Applies filter of six logarithmic frequency bands.
    • Identifies loudest frequencies in each band.
    • Uses average of strongest frequencies as dynamic threshold.
  3. Fingerprint Creation:

    • Treats each peak as an 'Anchor'. Identifies five nearby 'Targets'.
    • Creates unique hash encoding peak relationships.
    • Stores hashes in hashmap with time stamps and song IDs.

Front-End Development

  • Used React for the user interface.
  • Communication via WebSockets.
  • Integrated Spotify for song data retrieval.
  • Process:
    • Users upload songs or provide Spotify links.
    • Song data fetched from YouTube and processed.

Song Matching Process

  • User records audio snippet.
  • Snippet encoded and processed to create fingerprint.
  • Queries fingerprint database for matches.
  • Evaluates time coherence to find best match.

Conclusion

  • Most challenging project undertaken.
  • Provides a demonstration of technical capability.
  • Links to demo and GitHub repository for further exploration.

Additional Notes

  • Copyright laws prevent full demo.
  • Demo video and code available via provided links.