Text to video pipeline,
stage by stage.
direktor turns a text file into a podcast-style video — narrated audio, AI-generated stills, keyword overlays. Six checkpointed stages you can stop, edit and resume. Built for makers who'd rather own the pipeline than rent a black box.
pip install direktor
direktor input.txt What direktor actually is
It is
- A Python library + CLI (
direktor) - A 6-stage, resumable pipeline with on-disk checkpoints
- Narration + AI stills + FFmpeg composition — podcast-style output
- Model-agnostic by env var: swap BARK, FLUX, GPT-4 for your own
- Hooked to OpenAI + Replicate + S3-compatible storage
It is not
- A motion-video generator. Stills change at segment boundaries; there is no per-frame video model.
- Wired to Sora, Veo or Runway. The default image model is FLUX-schnell on Replicate.
- A SaaS. You bring your own API keys and your own bucket.
- Magic. Bad scripts make bad videos — direktor optimises and narrates, it does not invent the story.
How a run unfolds
- 01
You write or paste text
A blog post, a research note, a chapter draft. UTF-8 text file in, video out.
- 02
Pipeline runs in stages
Each stage writes to a temp dir keyed off your input. Crash on stage 5? Re-run, skip the first four.
- 03
Stop at any stage
direktor input.txt --stage 3halts after transcript. Editimage_prompts.jsonby hand before resuming. - 04
FFmpeg does the assembly
Images are converted to PNG, stitched with the audio track at 1920×1080, with optional keyword overlays drawn by drawtext.
The stack, named
No hand-waving. These are the exact services and models direktor calls today:
| Stage | Default model / service | Provider |
|---|---|---|
| Script generation | gpt-4-turbo-preview | OpenAI |
| Narration (TTS) | suno-ai/bark | Replicate |
| Transcription | distil-whisper | Replicate |
| Image prompts | gpt-4-turbo-preview | OpenAI |
| Image generation | black-forest-labs/flux-schnell | Replicate |
| Composition | FFmpeg (concat demuxer + drawtext) | Local |
| Intermediate storage | S3-compatible (Cloudflare R2, B2) | Your bucket |
Each model id is overridable. BARK_MODEL, FLUX_MODEL, GPT4_MODEL and DISTIL_MODEL are env vars, not constants.
Read the build notes
Three pieces on why pipeline-based video generation tends to beat single-shot prompting for anything longer than a TikTok.
Open the blog →