browser-use / video-use

Introducing video-use — edit videos with Claude Code. 100% open source. Drop raw footage in a folder, chat with Claude Code, get final.mp4 back. Works for any content — talking heads, montages, tutorials, travel, interviews — without presets or menus. 隆重介绍 video-use —— 使用 Claude Code 进行视频剪辑。该项目 100% 开源。只需将原始素材放入文件夹，与 Claude Code 对话，即可获得最终的 final.mp4。它适用于任何内容——口播、混剪、教程、旅行、访谈——无需预设或菜单。

What it does:

Cuts out filler words (umm, uh, false starts) and dead space between takes
Auto color grades every segment (warm cinematic, neutral punch, or any custom ffmpeg chain)
30ms audio fades at every cut so you never hear a pop
Burns subtitles in your style — 2-word UPPERCASE chunks by default, fully customizable
Generates animation overlays via HyperFrames, Remotion, Manim, or PIL — spawned in parallel sub-agents, one per animation
Self-evaluates the rendered output at every cut boundary before showing you anything
Persists session memory in project.md so next week’s session picks up where you left off

功能特性：

自动剪除语气词（嗯、啊、口误）以及镜头间的空白片段。
为每个片段自动进行色彩分级（温暖电影感、中性冲击感，或任何自定义的 ffmpeg 链）。
在每次剪辑处进行 30ms 的音频淡入淡出，确保不会出现爆音。
根据你的风格压制字幕——默认采用双词大写块，且完全可自定义。
通过 HyperFrames、Remotion、Manim 或 PIL 生成动画叠加层——由并行子代理生成，每个动画对应一个子代理。
在向你展示结果前，会在每个剪辑边界对渲染输出进行自我评估。
将对话记忆持久化存储在 project.md 中，以便下周的会话能从上次中断的地方继续。

Setup prompt: Paste into Claude Code, Codex, Hermes, Openclaw, or any agent with shell access: Set up https://github.com/browser-use/video-use for me. Read install.md first to install this repo, wire up ffmpeg, register the skill with whichever agent you’re running under, and set up the ElevenLabs API key — ask me to paste it when you need it. Then read SKILL.md for daily usage, and always read helpers/ because that’s where the editing scripts live.

设置提示词： 将其粘贴到 Claude Code、Codex、Hermes、Openclaw 或任何具有 shell 访问权限的代理中： “请为我设置 https://github.com/browser-use/video-use。首先阅读 install.md 来安装此仓库，配置 ffmpeg，将该技能注册到你当前运行的代理中，并设置 ElevenLabs API 密钥——需要时请让我粘贴。然后阅读 SKILL.md 以了解日常使用方法，并务必阅读 helpers/ 目录，因为那是编辑脚本所在的位置。”

After install, don’t transcribe anything on your own — just tell me it’s ready and wait for me to drop footage into a folder. The agent handles the clone, dependencies, skill registration, and prompts you once for your ElevenLabs API key (grab one at elevenlabs.io/app/settings/api-keys). Then point your agent at a folder of raw takes: 安装完成后，无需自行转录任何内容——只需告诉我准备就绪，然后等待我将素材放入文件夹即可。代理会自动处理克隆、依赖项、技能注册，并会提示你输入一次 ElevenLabs API 密钥（可在 elevenlabs.io/app/settings/api-keys 获取）。然后将你的代理指向原始素材文件夹：

cd /path/to/your/videos claude # or codex, hermes, etc.

For always-on editing from your own VPS or Telegram, run the agent through Browser Use Box. Watch the 15-second demo. And in the session: edit these into a launch video. It inventories the sources, proposes a strategy, waits for your OK, then produces edit/final.mp4 next to your sources. All outputs live in <videos_dir>/edit/ — the skill directory stays clean. 若要通过自己的 VPS 或 Telegram 进行全天候剪辑，请通过 Browser Use Box 运行该代理。观看 15 秒的演示视频。在会话中输入：“将这些剪辑成一个发布视频”。它会盘点素材、提出策略、等待你的确认，然后生成 edit/final.mp4 到你的素材旁边。所有输出文件都存放在 <videos_dir>/edit/ 中——技能目录保持整洁。

Manual install: If you’d rather do it by hand: 手动安装：如果你更倾向于手动操作：

# 1. Clone and symlink into your agent's skills directory
git clone https://github.com/browser-use/video-use ~/Developer/video-use
ln -sfn ~/Developer/video-use ~/.claude/skills/video-use # Claude Code
# ln -sfn ~/Developer/video-use ~/.codex/skills/video-use # Codex

# 2. Install deps
cd ~/Developer/video-use
uv sync # or: pip install -e .
brew install ffmpeg # required
brew install yt-dlp # optional, for downloading online sources

# 3. Add your ElevenLabs API key
cp .env.example .env
$EDITOR .env # ELEVENLABS_API_KEY=...

How it works: The LLM never watches the video. It reads it — through two layers that together give it everything it needs to cut with word-boundary precision. 工作原理： LLM 从不“观看”视频。它是通过两个层级来“阅读”视频，这两层结合在一起，为它提供了以单词边界精度进行剪辑所需的一切信息。

Layer 1 — Audio transcript (always loaded). One ElevenLabs Scribe call per source gives word-level timestamps, speaker diarization, and audio events ((laughter), (applause), (sigh)). All takes pack into a single ~12KB takes_packed.md — the LLM’s primary reading view. 第一层——音频转录（始终加载）。每个源文件调用一次 ElevenLabs Scribe，即可获得单词级的精确时间戳、说话人识别以及音频事件（如（笑声）、（掌声）、（叹气））。所有素材被打包成一个约 12KB 的 takes_packed.md 文件——这是 LLM 的主要阅读视图。

Layer 2 — Visual composite (on demand). timeline_view produces a filmstrip + waveform + word labels PNG for any time range. Called only at decision points — ambiguous pauses, retake comparisons, cut-point sanity checks. Naive approach: 30,000 frames × 1,500 tokens = 45M tokens of noise. Video Use: 12KB text + a handful of PNGs. Same idea as browser-use giving an LLM a structured DOM instead of a screenshot — but for video. 第二层——视觉合成（按需调用）。timeline_view 会为任何时间范围生成一张包含胶片条、波形图和单词标签的 PNG 图片。仅在决策点调用——例如模糊的停顿、重拍对比、剪辑点合理性检查。简单粗暴的方法是：30,000 帧 × 1,500 tokens = 4500 万 tokens 的噪音。而 Video Use 的方案是：12KB 文本 + 少量 PNG 图片。这与 browser-use 为 LLM 提供结构化 DOM 而非截图的理念相同——只不过是针对视频。

Pipeline: Transcribe ──> Pack ──> LLM Reasons ──> EDL ──> Render ──> Self-Eval │ └─ issue? fix + re-render (max 3) 流水线：转录 ──> 打包 ──> LLM 推理 ──> EDL（编辑决策列表） ──> 渲染 ──> 自我评估 │ └─ 有问题？修复 + 重新渲染（最多 3 次）

The self-eval loop runs timeline_view on the rendered output at every cut boundary — catches visual jumps, audio pops, hidden subtitles. You see the preview only after it passes. 自我评估循环会在每个剪辑边界对渲染输出运行 timeline_view——捕捉视觉跳变、音频爆音、隐藏的字幕等问题。只有通过评估后，你才能看到预览。

Design principles: Text + on-demand visuals. No frame-dumping. The transcript is the surface. Audio is primary, visuals follow. Cuts come from speech boundaries and silence gaps. Ask → confirm → execute → self-eval → persist. Never touch the cut without strategy approval. Zero assumptions about content type. Look, ask, then edit. 12 hard rules, artistic freedom elsewhere. Production-correctness is non-negotiable. Taste isn’t. See SKILL.md for the full production rules and editing craft. 设计原则：文本 + 按需视觉。不进行帧转储。转录是核心界面。音频优先，视觉跟随。剪辑基于语音边界和静音间隙。询问 → 确认 → 执行 → 自我评估 → 持久化。未经策略批准，绝不进行剪辑。不对内容类型做任何假设。先看、先问，再编辑。12 条硬性规则，其余部分保留艺术自由。制作的准确性不可妥协，但品味可以。详见 SKILL.md 以获取完整的制作规则和剪辑技巧。