calesthio / OpenMontage

OpenMontage: The First Open-Source, Agentic Video Production System

OpenMontage 是首个开源的代理式(Agentic)视频制作系统。

Turn your AI coding assistant into a full video production studio. Describe what you want in plain language — your agent handles research, scripting, asset generation, editing, and final composition. Important distinction: OpenMontage can make image-based videos, but it can also make a real video for free/open-source workflows: the agent builds a corpus from free stock footage and open archives, retrieves actual motion clips, edits them into a timeline, and renders a finished piece. That is not the usual “animate a handful of stills and call it video” trick.

将你的 AI 编程助手变成一个全能视频制作工作室。只需用通俗语言描述你的需求,你的代理(Agent)就会负责调研、撰写脚本、生成素材、剪辑以及最终合成。一个重要的区别是:OpenMontage 不仅能制作基于图像的视频,还能为免费/开源工作流制作真实的视频——代理会从免费素材库和开放档案中构建语料库,检索真实的动态片段,将其剪辑到时间轴上,并渲染出成品。这绝非那种“让几张静态图动起来就叫视频”的常见套路。

  • “SIGNAL FROM TOMORROW”: A cinematic sci-fi trailer fully produced through OpenMontage: concept, script, scene plan, Veo-generated motion clips, soundtrack, and Remotion composition.
    • 《来自明天的信号》:一部完全通过 OpenMontage 制作的科幻电影预告片,涵盖了概念、脚本、场景规划、Veo 生成的动态片段、配乐以及 Remotion 合成。
  • “THE LAST BANANA”: A 60-second Pixar-style animated short about a lonely banana who finds friendship with a kiwi. 6 Kling v3-generated motion clips (via fal.ai), Google Chirp3-HD narration, royalty-free piano music, TikTok-style word-level captions, and Remotion composition. Total cost: $1.33.
    • 《最后的香蕉》:一部 60 秒皮克斯风格的动画短片,讲述了一根孤独的香蕉与一只奇异果结下友谊的故事。包含 6 个由 Kling v3 生成的动态片段(通过 fal.ai)、Google Chirp3-HD 配音、免版税钢琴曲、TikTok 风格的逐字字幕以及 Remotion 合成。总成本:1.33 美元。
  • “VOID — Neural Interface”: A product ad produced with just one API key (OpenAI). 4 AI-generated images (gpt-image-1), TTS narration, auto-sourced royalty-free music, word-level subtitles via WhisperX, and Remotion data visualizations. Total cost: $0.69. Zero manual asset work.
    • 《VOID — 神经接口》:仅使用一个 API 密钥(OpenAI)制作的产品广告。包含 4 张 AI 生成的图像(gpt-image-1)、TTS 配音、自动获取的免版税音乐、通过 WhisperX 生成的逐字字幕以及 Remotion 数据可视化。总成本:0.69 美元。无需任何手动素材处理。

Start From A Video You Already Love (从你喜欢的视频开始)

Starting from a reference video is often faster than starting from a blank prompt. OpenMontage can start from a YouTube video, Short, Reel, TikTok, or local clip and turn it into a grounded production plan.

  • Paste a reference video.
  • The agent analyzes transcript, pacing, scenes, keyframes, and style.
  • You get 2-3 differentiated concepts, an honest tool path, cost estimates, and a sample before full production.

从参考视频开始往往比从空白提示词开始更快。OpenMontage 可以从 YouTube 视频、Shorts、Reels、TikTok 或本地剪辑开始,并将其转化为扎实的制作方案:

  • 粘贴参考视频。
  • 代理会分析文稿、节奏、场景、关键帧和风格。
  • 在正式制作前,你将获得 2-3 个差异化的概念、真实的工具路径、成本估算以及一个样本。

“Here’s a YouTube Short I love. Make me something like this, but about quantum computing.” What you get back is not “best guess prompt spaghetti.” You get:

  • What it keeps from the reference: pacing, hook style, structure, tone.
  • What it changes: topic, visual treatment, angle, narration approach.
  • What it will cost at your target duration, before asset generation starts.
  • What it will actually look like with your currently available tools.

“这是我喜欢的一个 YouTube Short。帮我做一个类似的,但主题是量子计算。”你得到的不是“胡乱猜测的提示词堆砌”,而是:

  • 从参考视频中保留的内容:节奏、钩子风格、结构、基调。
  • 改变的内容:主题、视觉处理、角度、叙事方式。
  • 在素材生成前,预估目标时长的成本。
  • 结合你当前可用工具,最终效果会是什么样。

Works with Claude Code, Cursor, Copilot, Windsurf, Codex — any AI coding assistant that can read files and run code.

适用于 Claude Code、Cursor、Copilot、Windsurf、Codex 等任何能够读取文件并运行代码的 AI 编程助手。

Quick Start (快速开始)

Prerequisites (先决条件):

  • Python 3.10+
  • FFmpeg (brew install ffmpeg / sudo apt install ffmpeg)
  • Node.js 18+
  • An AI coding assistant (Claude Code, Cursor, Copilot, Windsurf, or Codex)

Install & Run (安装与运行):

git clone https://github.com/calesthio/OpenMontage.git
cd OpenMontage
make setup

Open the project in your AI coding assistant and tell it what you want: “Make a 60-second animated explainer about how neural networks learn.” Or if you want the real-footage path: “Make a 75-second documentary montage about city life in the rain. Use real footage only, no narration, elegiac tone, with music.”

在你的 AI 编程助手中打开项目并告诉它你的需求: “制作一个 60 秒的动画解释视频,介绍神经网络是如何学习的。” 或者如果你想要真实素材路径: “制作一个 75 秒的纪录片式蒙太奇,关于雨中的城市生活。仅使用真实素材,无需旁白,基调哀婉,配有音乐。”

That’s it. The agent researches your topic with live web search, generates AI images, writes and narrates the script with voice direction, finds royalty-free background music automatically, burns in word-level subtitles, and renders the final video.

就是这样。代理会通过实时网络搜索调研你的主题,生成 AI 图像,编写并配音脚本(包含语音指导),自动寻找免版税背景音乐,嵌入逐字字幕,并渲染最终视频。

Before you see anything, the system runs a multi-point self-review — ffprobe validation, frame sampling, audio level analysis, delivery promise verification, and subtitle checks. Every provider selection is scored across 7 dimensions with an auditable decision log. Every creative decision gets your approval.

在看到成品前,系统会进行多点自检——包括 ffprobe 验证、帧采样、音频电平分析、交付承诺验证以及字幕检查。每个服务商的选择都会在 7 个维度上进行评分,并附带可审计的决策日志。每一个创意决策都需经你批准。