This video provides a detailed, step-by-step guide for building an automated AI system to convert long-form podcast or YouTube videos into multiple short, captioned clips suitable for platforms like Instagram Reels, TikTok, and YouTube Shorts.
The system utilizes Airtable, Make (formerly Integromat), and a self-hosted, open-source NCA Toolkit server on Digital Ocean to manage video uploads, transcription, clip selection, cropping, and captioning.
The tutorial covers the full technical setup, including database structure, AI-driven and deterministic clip segmentation, video processing, and final automation of video publishing.
The approach is aimed at reducing costs compared to commercial services, and offers the flexibility to adapt or resell the workflow for business or client use.
Action Items
None indicated in the transcript. All instructions pertain to individual users setting up their own system as demonstrated.
System Overview & Workflow
The automation process is named "content clip magic" and consists of multiple stages:
Upload video info (description, video link, dimensions) into Airtable.
Trigger automation to transcribe video using the self-hosted NCA Toolkit.
Populate Airtable with transcript and SRT files for precise clipping.
Run automation to analyze transcript (using Claude or GPT) to determine engaging short clips.
Structure AI output into JSON for easier downstream processing and entry into Airtable.
Parse SRT files to extract precise start and end times of each selected clip.
Insert new clips into Airtable for further processing.
Technical Setup
Users are provided with a free template Airtable base to duplicate and use.
The NCA Toolkit is deployed as a container on Digital Ocean for cost efficiency and flexibility, with credentials and storage set up via Digital Ocean Spaces.
Test runs are recommended with short videos before scaling up to long-form content.
Postman is used to confirm that the NCA Toolkit API is set up and responding correctly before proceeding.
All API endpoints, credentials, and environment variables must be entered exactly as demonstrated to ensure successful operation.
Automation Build Steps
Transcription Automation:
Set up Make to detect new unprocessed video records in Airtable, trigger NCA Toolkit for transcription, and wait for webhook callback to save transcript/SRT URLs.
Clip Identification Automation:
Once transcripts are ready, download them, use AI to find strong segments, and format output as JSON.
Run deterministic modules to parse SRTs, match clips to transcript segments, and extract precise segment numbers/times.
Create new clip records in Airtable linked to the source video.
Video Clipping Automation:
Search for clips needing processing, use NCA Toolkit endpoint to cut video based on timestamp and duration, and receive final clip links via webhook.
Cropping & Scaling Automation:
For each clip, analyze the thumbnail with GPT Vision to locate the face (X,Y coordinates).
Update Airtable with cropping data, then call NCA Toolkit API to crop and scale the video to vertical format.
Captioning Automation:
For cropped clips, use NCA Toolkit API to automatically add styled captions, receiving final output URLs via webhook.
Usage Considerations
The workflow demonstrates the combination of AI and deterministic processing: using AI for creative selection and deterministic tools for accuracy in timecodes and file processing.
Users are recommended to check server logs and Make execution results for troubleshooting.
Scaling up involves careful server selection (CPU/memory), consideration of concurrent processing limits, and ensuring automations are enabled at each step.
Decisions
Use NCA Toolkit for end-to-end video processing and transcription — Chosen for cost effectiveness and control over processing compared to commercial SaaS solutions.
AI for clip finding, deterministic tools for SRT parsing — AI models were found to be effective for choosing engaging content, while traditional parsing offered more accuracy for start/end time extraction.
Open Questions / Follow-Ups
None identified; all instructions and decision rationales are provided in the context of building an individual automation as a technical tutorial.