open source · python 3.11+

Text to video pipeline,
stage by stage.

direktor turns a text file into a podcast-style video — narrated audio, AI-generated stills, keyword overlays. Six checkpointed stages you can stop, edit and resume. Built for makers who'd rather own the pipeline than rent a black box.

pip install direktor
direktor input.txt

What direktor actually is

It is

  • A Python library + CLI (direktor)
  • A 6-stage, resumable pipeline with on-disk checkpoints
  • Narration + AI stills + FFmpeg composition — podcast-style output
  • Model-agnostic by env var: swap BARK, FLUX, GPT-4 for your own
  • Hooked to OpenAI + Replicate + S3-compatible storage

It is not

  • A motion-video generator. Stills change at segment boundaries; there is no per-frame video model.
  • Wired to Sora, Veo or Runway. The default image model is FLUX-schnell on Replicate.
  • A SaaS. You bring your own API keys and your own bucket.
  • Magic. Bad scripts make bad videos — direktor optimises and narrates, it does not invent the story.

How a run unfolds

  1. 01

    You write or paste text

    A blog post, a research note, a chapter draft. UTF-8 text file in, video out.

  2. 02

    Pipeline runs in stages

    Each stage writes to a temp dir keyed off your input. Crash on stage 5? Re-run, skip the first four.

  3. 03

    Stop at any stage

    direktor input.txt --stage 3 halts after transcript. Edit image_prompts.json by hand before resuming.

  4. 04

    FFmpeg does the assembly

    Images are converted to PNG, stitched with the audio track at 1920×1080, with optional keyword overlays drawn by drawtext.

The stack, named

No hand-waving. These are the exact services and models direktor calls today:

StageDefault model / serviceProvider
Script generationgpt-4-turbo-previewOpenAI
Narration (TTS)suno-ai/barkReplicate
Transcriptiondistil-whisperReplicate
Image promptsgpt-4-turbo-previewOpenAI
Image generationblack-forest-labs/flux-schnellReplicate
CompositionFFmpeg (concat demuxer + drawtext)Local
Intermediate storageS3-compatible (Cloudflare R2, B2)Your bucket

Each model id is overridable. BARK_MODEL, FLUX_MODEL, GPT4_MODEL and DISTIL_MODEL are env vars, not constants.

Read the build notes

Three pieces on why pipeline-based video generation tends to beat single-shot prompting for anything longer than a TikTok.

Open the blog →