Video Automation System Overview

Summary

This video provides a detailed, step-by-step guide for building an automated AI system to convert long-form podcast or YouTube videos into multiple short, captioned clips suitable for platforms like Instagram Reels, TikTok, and YouTube Shorts.
The system utilizes Airtable, Make (formerly Integromat), and a self-hosted, open-source NCA Toolkit server on Digital Ocean to manage video uploads, transcription, clip selection, cropping, and captioning.
The tutorial covers the full technical setup, including database structure, AI-driven and deterministic clip segmentation, video processing, and final automation of video publishing.
The approach is aimed at reducing costs compared to commercial services, and offers the flexibility to adapt or resell the workflow for business or client use.

None indicated in the transcript. All instructions pertain to individual users setting up their own system as demonstrated.

Users are provided with a free template Airtable base to duplicate and use.
The NCA Toolkit is deployed as a container on Digital Ocean for cost efficiency and flexibility, with credentials and storage set up via Digital Ocean Spaces.
Test runs are recommended with short videos before scaling up to long-form content.
Postman is used to confirm that the NCA Toolkit API is set up and responding correctly before proceeding.
All API endpoints, credentials, and environment variables must be entered exactly as demonstrated to ensure successful operation.

Transcription Automation:
- Set up Make to detect new unprocessed video records in Airtable, trigger NCA Toolkit for transcription, and wait for webhook callback to save transcript/SRT URLs.
Clip Identification Automation:
- Once transcripts are ready, download them, use AI to find strong segments, and format output as JSON.
- Run deterministic modules to parse SRTs, match clips to transcript segments, and extract precise segment numbers/times.
- Create new clip records in Airtable linked to the source video.
Video Clipping Automation:
- Search for clips needing processing, use NCA Toolkit endpoint to cut video based on timestamp and duration, and receive final clip links via webhook.
Cropping & Scaling Automation:
- For each clip, analyze the thumbnail with GPT Vision to locate the face (X,Y coordinates).
- Update Airtable with cropping data, then call NCA Toolkit API to crop and scale the video to vertical format.
Captioning Automation:
- For cropped clips, use NCA Toolkit API to automatically add styled captions, receiving final output URLs via webhook.

The workflow demonstrates the combination of AI and deterministic processing: using AI for creative selection and deterministic tools for accuracy in timecodes and file processing.
Users are recommended to check server logs and Make execution results for troubleshooting.
Scaling up involves careful server selection (CPU/memory), consideration of concurrent processing limits, and ensuring automations are enabled at each step.

Use NCA Toolkit for end-to-end video processing and transcription — Chosen for cost effectiveness and control over processing compared to commercial SaaS solutions.
AI for clip finding, deterministic tools for SRT parsing — AI models were found to be effective for choosing engaging content, while traditional parsing offered more accuracy for start/end time extraction.

None identified; all instructions and decision rationales are provided in the context of building an individual automation as a technical tutorial.