🎥

Video Automation System Overview

Jul 27, 2025

Summary

  • This video provides a detailed, step-by-step guide for building an automated AI system to convert long-form podcast or YouTube videos into multiple short, captioned clips suitable for platforms like Instagram Reels, TikTok, and YouTube Shorts.
  • The system utilizes Airtable, Make (formerly Integromat), and a self-hosted, open-source NCA Toolkit server on Digital Ocean to manage video uploads, transcription, clip selection, cropping, and captioning.
  • The tutorial covers the full technical setup, including database structure, AI-driven and deterministic clip segmentation, video processing, and final automation of video publishing.
  • The approach is aimed at reducing costs compared to commercial services, and offers the flexibility to adapt or resell the workflow for business or client use.

Action Items

  • None indicated in the transcript. All instructions pertain to individual users setting up their own system as demonstrated.

System Overview & Workflow

  • The automation process is named "content clip magic" and consists of multiple stages:
    • Upload video info (description, video link, dimensions) into Airtable.
    • Trigger automation to transcribe video using the self-hosted NCA Toolkit.
    • Populate Airtable with transcript and SRT files for precise clipping.
    • Run automation to analyze transcript (using Claude or GPT) to determine engaging short clips.
    • Structure AI output into JSON for easier downstream processing and entry into Airtable.
    • Parse SRT files to extract precise start and end times of each selected clip.
    • Insert new clips into Airtable for further processing.

Technical Setup

  • Users are provided with a free template Airtable base to duplicate and use.
  • The NCA Toolkit is deployed as a container on Digital Ocean for cost efficiency and flexibility, with credentials and storage set up via Digital Ocean Spaces.
  • Test runs are recommended with short videos before scaling up to long-form content.
  • Postman is used to confirm that the NCA Toolkit API is set up and responding correctly before proceeding.
  • All API endpoints, credentials, and environment variables must be entered exactly as demonstrated to ensure successful operation.

Automation Build Steps

  • Transcription Automation:
    • Set up Make to detect new unprocessed video records in Airtable, trigger NCA Toolkit for transcription, and wait for webhook callback to save transcript/SRT URLs.
  • Clip Identification Automation:
    • Once transcripts are ready, download them, use AI to find strong segments, and format output as JSON.
    • Run deterministic modules to parse SRTs, match clips to transcript segments, and extract precise segment numbers/times.
    • Create new clip records in Airtable linked to the source video.
  • Video Clipping Automation:
    • Search for clips needing processing, use NCA Toolkit endpoint to cut video based on timestamp and duration, and receive final clip links via webhook.
  • Cropping & Scaling Automation:
    • For each clip, analyze the thumbnail with GPT Vision to locate the face (X,Y coordinates).
    • Update Airtable with cropping data, then call NCA Toolkit API to crop and scale the video to vertical format.
  • Captioning Automation:
    • For cropped clips, use NCA Toolkit API to automatically add styled captions, receiving final output URLs via webhook.

Usage Considerations

  • The workflow demonstrates the combination of AI and deterministic processing: using AI for creative selection and deterministic tools for accuracy in timecodes and file processing.
  • Users are recommended to check server logs and Make execution results for troubleshooting.
  • Scaling up involves careful server selection (CPU/memory), consideration of concurrent processing limits, and ensuring automations are enabled at each step.

Decisions

  • Use NCA Toolkit for end-to-end video processing and transcription — Chosen for cost effectiveness and control over processing compared to commercial SaaS solutions.
  • AI for clip finding, deterministic tools for SRT parsing — AI models were found to be effective for choosing engaging content, while traditional parsing offered more accuracy for start/end time extraction.

Open Questions / Follow-Ups

  • None identified; all instructions and decision rationales are provided in the context of building an individual automation as a technical tutorial.