Embedding Subtitles to Video Using Python

Overview

This lecture walks through the steps to download a YouTube video, extract its audio, transcribe it using OpenAI's Whisper, preprocess the transcript, and embed the subtitles back into the video using Python libraries.

Key Libraries

PBE library:
- Used for downloading YouTube videos
FFmpeg Python:
- Used for extracting audio and embedding subtitles
Faster Whisper:
- Used for transcribing audio to text

Step-by-Step Process

1. Download Video

URL of Interest:
- Specify video URL
PBE library:
- Download the video from the given YouTube URL
- Save locally

2. Extract Audio

Use FFmpeg to extract the audio stream
Save the audio stream as a .wav file

3. Transcribe Audio Using Whisper

Whisper Model:
- Use the local, open-source Whisper model for transcription (not the API)
- Transcribe the audio to get an array of segments with time and text
Reason for Using Faster Whisper:
- Faster transcription without significant loss in accuracy
- Implemented using ctranslate2 for faster computation

4. Preprocess Transcript

Time Formatting:
- Convert transcript times to specific formats for subtitles
- Normalize time format as HH:MM:SS,SSS

5. Generate Subtitle File

SubRip Subtitles (SRT) Format:
- Generate .srt file with indices, time codes, and text

6. Embed Subtitles Into Video

Hard Burn Subtitles:
- Embed the subtitles as a permanent part of the video using FFmpeg
- Alternatively, you can generate switchable closed captions (not covered in this lecture)

Practical Coding Steps

Environment Setup

Create a Python virtual environment
Install required libraries using requirements.txt
Verify installation of Faster Whisper, FFmpeg Python, and PBE

Download Video Example

Use PBE to download a YouTube video
Print video details (title, views, description, length)

import os
from pytube import YouTube

url = 'YOUR_YOUTUBE_VIDEO_URL'
yt = YouTube(url)
yt.streams.filter(progressive=True, file_extension='mp4').order_by('resolution').desc().first().download()

Extract Audio Example

Convert .mp4 to .wav

import ffmpeg

def extract_audio(video_path):
    audio_path = 'output_audio.wav'
    stream = ffmpeg.input(video_path)
    stream = ffmpeg.output(stream, audio_path)
    ffmpeg.run(stream, overwrite_output=True)
    return audio_path

audio_file = extract_audio('your_video.mp4')

Transcribe Audio Example

Use Faster Whisper to transcribe audio

from faster_whisper import WhisperModel

def transcribe_audio(audio_path):
    model = WhisperModel('small')
    segments, info = model.transcribe(audio_path)
    language = info['language']
    return language, segments

language, segments = transcribe_audio('output_audio.wav')

Preprocess and Format Time Example

Helper function to format time

import math

def format_time_for_srt(seconds):
    hours = math.floor(seconds / 3600)
    seconds %= 3600
    minutes = math.floor(seconds / 60)
    seconds %= 60
    milliseconds = round((seconds - math.floor(seconds)) * 1000)
    formatted_time = f"{hours:02d}:{minutes:02d}:{math.floor(seconds):02d},{milliseconds:03d}"
    return formatted_time

Generate Subtitle File Example

Generate .srt file from segments


def generate_srt_file(language, segments, output_file):
    with open(output_file, 'w') as f:
        for index, segment in enumerate(segments):
            start = format_time_for_srt(segment['start'])
            end = format_time_for_srt(segment['end'])
            text = segment['text']
            f.write(f"{index + 1}\n{start} --> {end}\n{text}\n\n")

srt_file = 'your_subtitle.srt'
generate_srt_file(language, segments, srt_file)

Embed Subtitles Example

Hard burn subtitles onto the video


def add_subtitles_to_video(video_path, srt_path, output_path):
    video = ffmpeg.input(video_path)
    audio = video.audio
    ffmpeg.concat(video.filter('subtitles', srt_path), audio, v=1, a=1).output(output_path).run(overwrite_output=True)

output_video = 'output_with_subtitles.mp4'
add_subtitles_to_video('your_video.mp4', 'your_subtitle.srt', output_video)

Conclusion

Successfully downloaded, processed, and embedded subtitles into a video using Python libraries.
Alternative methods for closed captions discussed but not implemented.
The full code will be available on GitHub for further reference.

Thank you!