Deep DiveMay 2, 20268 min read
Building Cinematic Videos with AI: Our Video Pipeline Explained
A deep dive into how AgenticVexa generates multi-scene videos with narration, music, and visual continuity.
A
AgenticVexa Team
Engineering Team
AgenticVexa's video generation isn't just a simple text-to-video model. It's a full cinematic pipeline that produces multi-scene videos with consistent characters, AI narration, dynamic music, and professional transitions.
The Pipeline
When you submit a video prompt, it goes through 7 stages:
- Story Bible — An LLM creates a comprehensive story document: characters, settings, visual style, and narrative arc
- Scene Breakdown — The story is divided into individual scenes with specific prompts, camera angles, and timing
- Image Generation — Each scene is rendered as a high-quality image using FLUX, with character and style consistency maintained across scenes
- Voice Narration — Kokoro generates natural narration for each scene with appropriate pacing and emotion
- Music Composition — AI generates a background score that matches the mood and tempo of the video
- Ken Burns Effects — Subtle pan and zoom animations are applied to each scene image for cinematic motion
- Final Composition — Everything is assembled with professional transitions, audio mixing, and 4K output
Credits
Video generation costs 50 base credits + 8 credits per scene. A typical 5-scene video costs 90 credits and produces a 60-90 second cinematic piece.
API Usage
job = client.video.generate(
prompt="A documentary about the history of space exploration",
scenes=5,
style="cinematic",
narration=True,
music=True
)
# Video generation is async — poll for status
while job.status == "processing":
time.sleep(10)
job.refresh()
print(job.video_url) # Your finished video
#video#pipeline#ai#cinematic