Does direktor generate motion video?

No. direktor produces still images from FLUX-schnell and composes them under a narration track with FFmpeg. Cuts happen at ~30-second transcript-segment boundaries. It is not Sora, Veo or Runway and does not produce per-frame motion.

What do I need to run it?

Python 3.11+, FFmpeg installed locally, an OpenAI API key, a Replicate API token, and an S3-compatible bucket (Cloudflare R2, Backblaze B2) for the intermediate audio that feeds transcription.

Can I edit the output part-way through?

Yes. Each stage writes a file. You can stop after any stage with --stage, hand-edit image_prompts.json or the script, then re-run — finished stages are skipped.

Is it free and open source?

direktor is MIT-licensed and installable with pip install direktor. You pay only for the third-party model calls (OpenAI, Replicate) on your own accounts.

open source · MIT · python 3.11+

Turn a text file into a
narrated video.

direktor is a resumable, six-stage Python pipeline that reads a text file and produces a 1920×1080 MP4 — GPT-4 script, BARK narration, FLUX stills, FFmpeg composition. Own every stage instead of renting a black box.

Quickstart → GitHub ↗

6 stages · 1920×1080 MP4 · 0 vendor lock-in

bash — direktor

$ pip install direktor
$ direktor build script.txt

# 1/6 script      ✓ podcast script written
# 2/6 narration   ✓ bark → narration.wav
# 3/6 transcript  ✓ 14 timestamped chunks
# 4/6 prompts     ✓ 14 image prompts
# 5/6 images      ✓ flux-schnell → 14 stills
# 6/6 compose     ✓ ffmpeg → output.mp4

→ output.mp4  (1920×1080, 06:12)

What is direktor?

A pipeline, not a magic prompt.

direktor is an MIT-licensed Python library and CLI from Skelf Research that turns a plain text file into a podcast-style video. It orchestrates six discrete, checkpointed stages — script generation, narration, transcription, image prompting, still generation, and FFmpeg composition — each writing a file to disk you can inspect, edit, and resume from. It is infrastructure for making narrated explainers you control, not a hosted video model.

Problem → solution

The two ways text-to-video usually breaks

The black-box trap

Single-shot models take a prompt and return an opaque clip — no script, no intermediates, nothing to edit.
One weird frame means re-rendering (and re-paying for) the whole thing.
Length caps out at seconds; long-form narration is out of scope.
You are locked to one vendor's model and pricing.

The direktor way

Six explicit stages, each a file on disk you can read, edit and version-control.
Re-run only the stage that failed — a single FLUX call, not a full re-render.
Designed for multi-minute narrated video from written material.
Every model is an env var; swap TTS, image or LLM as better ones ship.

Features

Built for makers who own the pipeline

Every design choice favours control, editability and cost you can predict.

Six resumable stages

Every stage writes a file to disk. Crash on stage 5 and you keep the script, audio and transcript — re-run skips the finished work.

Editable intermediates

The script, transcript, image prompts and stills are all plain files. Rewrite a weird FLUX prompt by hand, then resume from that stage.

Model-agnostic by env var

BARK_MODEL, FLUX_MODEL, GPT4_MODEL and DISTIL_MODEL are environment variables. Swap the TTS, image model or LLM without touching code.

Bring your own keys & bucket

No SaaS in the middle. You supply an OpenAI key, a Replicate token and an S3-compatible bucket (R2, B2). Spend stays on your accounts.

Podcast-style long-form

Built for 3–20 minute narrated explainers, not five-second clips. Cuts land at ~30s transcript boundaries with optional keyword overlays.

Scriptable CLI + Python API

Drive it from a Makefile, a GitHub Action, or a Python script. `direktor input.txt` in the shell, or `generate_video()` in code.

See all features →

How it works

Text in. Six stages. MP4 out.

Each stage checkpoints to disk, so you can stop, hand-edit, and resume at any point.

stage 1

Script

GPT-4 turns your text into a single-narrator podcast script.

openai · gpt-4-turbo

stage 2

Narration

BARK synthesises the spoken audio track.

replicate · suno-ai/bark

stage 3

Transcript

Distil-Whisper produces timestamped chunks.

replicate · distil-whisper

stage 4

Prompts

GPT-4 writes one image prompt per ~30s segment.

openai · gpt-4-turbo

stage 5

Images

FLUX-schnell renders the 16:9 stills.

replicate · flux-schnell

stage 6

Compose

FFmpeg stitches audio + stills + overlays into MP4.

local · ffmpeg

Walk through every stage →

Code showcase

Drive it from the CLI or from Python

// stop after transcript, edit prompts, resume

$ direktor build talk.txt --stage 3
  → halted after transcript

# hand-edit image_prompts.json ...
$ direktor build talk.txt --resume
  → skips stages 1–3, renders stills, composes

# the same pipeline from Python

from direktor import generate_video

generate_video(
    input_path="talk.txt",
    output_path="talk.mp4",
    overlays=True,          # keyword drawtext
)

Model ids are overridable via BARK_MODEL, FLUX_MODEL, GPT4_MODEL and DISTIL_MODEL — config, not code.

Honest numbers

What direktor is, in figures

resumable pipeline stages

1080p

1920×1080 MP4 output

providers: OpenAI · Replicate · your S3

MIT

open-source licence

No throughput or quality benchmarks are claimed — output quality depends on your script and the models you configure.

Explore direktor

Everything, one click deep

Jump straight into any part of the project — the pipeline, real use cases, honest comparisons and the reference material.

Ship your first video in minutes

Install the package, set three keys, point it at a text file. The quickstart walks the whole run end to end.

Read the quickstart → Browse use cases

Turn a text file into a
narrated video.

A pipeline, not a magic prompt.

The two ways text-to-video usually breaks

The black-box trap

The direktor way

Built for makers who own the pipeline

Six resumable stages

Editable intermediates

Model-agnostic by env var

Bring your own keys & bucket

Podcast-style long-form

Scriptable CLI + Python API

Text in. Six stages. MP4 out.

Script

Narration

Transcript

Prompts

Images

Compose

Drive it from the CLI or from Python

What direktor is, in figures

Everything, one click deep

Features →

The pipeline →

Quickstart →

Use cases →

Compare →

Blog →

FAQ →

Glossary →

About →

Ship your first video in minutes

Turn a text file into anarrated video.

A pipeline, not a magic prompt.

The two ways text-to-video usually breaks

The black-box trap

The direktor way

Built for makers who own the pipeline

Six resumable stages

Editable intermediates

Model-agnostic by env var

Bring your own keys & bucket

Podcast-style long-form

Scriptable CLI + Python API

Text in. Six stages. MP4 out.

Script

Narration

Transcript

Prompts

Images

Compose

Drive it from the CLI or from Python

What direktor is, in figures

Everything, one click deep

Features →

The pipeline →

Quickstart →

Use cases →

Compare →

Blog →

FAQ →

Glossary →

About →

Ship your first video in minutes

Turn a text file into a
narrated video.