Lecture Notes: Recreating Shazam's Song Recognition Algorithm

Introduction

Core concept: Audio Fingerprinting
- Creates a unique 'DNA profile' for every song.
Main Steps:
1. Spectrogram Creation: Converts song to frequency content over time.
2. Peak Identification: Pinpoints high-intensity frequencies or 'Peaks'.
3. Hash Encoding: Encodes relationships into unique hashes linked to song metadata.
4. Database Storage: Stores thousands of hashes as a song's fingerprint.

Spectrogram Creation:
- Converts raw audio into spectrogram via sliding window technique.
- Mitigates spectral leakage using Hamming window function.
- Uses FFT to transform audio to frequency domain.
- Downsamples audio, limits frequency range (20Hz to 5kHz).
Peak Extraction:
- Applies filter of six logarithmic frequency bands.
- Identifies loudest frequencies in each band.
- Uses average of strongest frequencies as dynamic threshold.
Fingerprint Creation:
- Treats each peak as an 'Anchor'. Identifies five nearby 'Targets'.
- Creates unique hash encoding peak relationships.
- Stores hashes in hashmap with time stamps and song IDs.

Used React for the user interface.
Communication via WebSockets.
Integrated Spotify for song data retrieval.
Process:
- Users upload songs or provide Spotify links.
- Song data fetched from YouTube and processed.