🗣️

Transcribing Speech with Whisper AI

Sep 24, 2024

Lecture Notes: Turning Speech into Text with Whisper AI

Introduction

  • Speaker: Kevin
  • Topic: Transcribing speech to text using AI (specifically Whisper)
  • Highlights:
    • AI transcribes better than most humans
    • Supports 96 languages
    • Effective with background noise and thick accents
    • Free and open source

Overview of Whisper

  • Developed by OpenAI (same company behind ChatGPT and DALI2)
  • Can be installed on a computer, but easier to use via Google Collaboratory

Setting Up Google Collaboratory

  1. Accessing Collaboratory: Basic Steps

    • Use Google Drive and click on 'New'
    • Go to 'More' > 'Connect more apps'
    • Search and install Google Collaboratory
  2. Starting Google Collaboratory

    • Click 'New' > 'More' > 'Google Collaboratory'
    • Name your file (e.g., 'Transcribe Audio')
  3. Configuring Runtime

    • Go to 'Runtime' > 'Change Runtime Type'
    • Select 'GPU' for optimal performance

Installing Whisper AI

  • Code Entry: Enter code in Google Collaboratory to install Whisper and FFmpeg from GitHub
  • Run Installation: Click the run icon to install (takes about 23 seconds)

Uploading and Transcribing Audio/Video Files

  1. Uploading Files:

    • Click the folder icon and drag in an audio or video file
    • Note: Files are temporary and will be deleted after runtime ends
  2. Transcribing Audio:

    • Insert code to call the Whisper AI and specify the filename
    • Choose a model (tiny to large), medium model recommended for balance of speed and accuracy
    • Run the transcription command
  3. Output Files:

    • Text file (TXT): Contains full transcript
    • Caption files (SRT, VTT): Include timestamps
    • Download these files if needed

Additional Features

  • Whisper Parameters:
    • Use whisper-h command to view all parameters
    • Specify output location, translation options, language, etc.

Practical Use Cases

  • Personal Application:
    • Used for YouTube video captions
    • More accurate than Google's auto-generated captions

Conclusion

  • Whisper offers high-quality transcription with additional capabilities
  • Encouragement to subscribe for more videos
  • End of presentation