Narrating over a Video using Multimodal AI Workflow Solution

Workflow overview

Why this workflow matters

Relevant for managed services and support workflows.

This AlekSystem template takes a video and extracts frames from it which are used with a multimodal LLM to generate a script. The script is then passed to the same multimodal LLM to generate a voiceover clip. This template was inspired by Processing and narrating a video with GPT's visual capabilities and the TTS API How it works Video is downloaded using the HTTP node. Python code node is used to extract the frames using OpenCV. Loop node is used o batch the frames for the LLM to generate partial scripts. All partial scripts are combined to form the full script which is then sent to OpenAI to generate audio from it. The finished voiceover clip is uploaded to Google Drive. Sample the finished product here: https://drive.google.com/file/d/1-XCoii0leGB2MffBMPpCZoxboVyeyeIX/view?usp=sharing Requirements OpenAI for LLM Ideally, a mid-range (16GB RAM) machine for acceptable performance! Customising this workflow For larger videos, consider splitting into smaller clips for better performance Use a multimodal LLM which supports fully video such as Google's Gemini.

Best fit

Services

Edit ImageGoogle DriveBasic LLM ChainOpenAI Chat ModelOpenAI

Use cases

support automation

Need another direction?

Continue a new search Request this workflow

Narrating over a Video using Multimodal AI Workflow Solution

Why this workflow matters

Categories

Services

Use cases

Related AlekSystem workflow ideas

Automated AI Timesheets for Consulting Teams

Executive AI Briefing and Follow-Up Assistant

Automated Multi-Channel Customer Support with Gmail, Telegram, and GPT AI Solution