browser-use / video-use
browser-use / video-use
Introducing video-use — edit videos with Claude Code. 100% open source. Drop raw footage in a folder, chat with Claude Code, get final.mp4 back. Works for any content — talking heads, montages, tutorials, travel, interviews — without presets or menus. 隆重介绍 video-use —— 使用 Claude Code 进行视频剪辑。该项目 100% 开源。只需将原始素材放入文件夹,与 Claude Code 对话,即可获得最终的 final.mp4。它适用于任何内容——口播、混剪、教程、旅行、访谈——无需预设或菜单。
What it does:
- Cuts out filler words (umm, uh, false starts) and dead space between takes
- Auto color grades every segment (warm cinematic, neutral punch, or any custom ffmpeg chain)
- 30ms audio fades at every cut so you never hear a pop
- Burns subtitles in your style — 2-word UPPERCASE chunks by default, fully customizable
- Generates animation overlays via HyperFrames, Remotion, Manim, or PIL — spawned in parallel sub-agents, one per animation
- Self-evaluates the rendered output at every cut boundary before showing you anything
- Persists session memory in project.md so next week’s session picks up where you left off
功能特性:
- 自动剪除语气词(嗯、啊、口误)以及镜头间的空白片段。
- 为每个片段自动进行色彩分级(温暖电影感、中性冲击感,或任何自定义的 ffmpeg 链)。
- 在每次剪辑处进行 30ms 的音频淡入淡出,确保不会出现爆音。
- 根据你的风格压制字幕——默认采用双词大写块,且完全可自定义。
- 通过 HyperFrames、Remotion、Manim 或 PIL 生成动画叠加层——由并行子代理生成,每个动画对应一个子代理。
- 在向你展示结果前,会在每个剪辑边界对渲染输出进行自我评估。
- 将对话记忆持久化存储在 project.md 中,以便下周的会话能从上次中断的地方继续。
Setup prompt: Paste into Claude Code, Codex, Hermes, Openclaw, or any agent with shell access: Set up https://github.com/browser-use/video-use for me. Read install.md first to install this repo, wire up ffmpeg, register the skill with whichever agent you’re running under, and set up the ElevenLabs API key — ask me to paste it when you need it. Then read SKILL.md for daily usage, and always read helpers/ because that’s where the editing scripts live.
设置提示词: 将其粘贴到 Claude Code、Codex、Hermes、Openclaw 或任何具有 shell 访问权限的代理中: “请为我设置 https://github.com/browser-use/video-use。首先阅读 install.md 来安装此仓库,配置 ffmpeg,将该技能注册到你当前运行的代理中,并设置 ElevenLabs API 密钥——需要时请让我粘贴。然后阅读 SKILL.md 以了解日常使用方法,并务必阅读 helpers/ 目录,因为那是编辑脚本所在的位置。”
After install, don’t transcribe anything on your own — just tell me it’s ready and wait for me to drop footage into a folder. The agent handles the clone, dependencies, skill registration, and prompts you once for your ElevenLabs API key (grab one at elevenlabs.io/app/settings/api-keys). Then point your agent at a folder of raw takes: 安装完成后,无需自行转录任何内容——只需告诉我准备就绪,然后等待我将素材放入文件夹即可。代理会自动处理克隆、依赖项、技能注册,并会提示你输入一次 ElevenLabs API 密钥(可在 elevenlabs.io/app/settings/api-keys 获取)。然后将你的代理指向原始素材文件夹:
cd /path/to/your/videos
claude # or codex, hermes, etc.
For always-on editing from your own VPS or Telegram, run the agent through Browser Use Box. Watch the 15-second demo. And in the session: edit these into a launch video. It inventories the sources, proposes a strategy, waits for your OK, then produces edit/final.mp4 next to your sources. All outputs live in <videos_dir>/edit/ — the skill directory stays clean. 若要通过自己的 VPS 或 Telegram 进行全天候剪辑,请通过 Browser Use Box 运行该代理。观看 15 秒的演示视频。在会话中输入:“将这些剪辑成一个发布视频”。它会盘点素材、提出策略、等待你的确认,然后生成 edit/final.mp4 到你的素材旁边。所有输出文件都存放在 <videos_dir>/edit/ 中——技能目录保持整洁。
Manual install: If you’d rather do it by hand: 手动安装: 如果你更倾向于手动操作:
# 1. Clone and symlink into your agent's skills directory
git clone https://github.com/browser-use/video-use ~/Developer/video-use
ln -sfn ~/Developer/video-use ~/.claude/skills/video-use # Claude Code
# ln -sfn ~/Developer/video-use ~/.codex/skills/video-use # Codex
# 2. Install deps
cd ~/Developer/video-use
uv sync # or: pip install -e .
brew install ffmpeg # required
brew install yt-dlp # optional, for downloading online sources
# 3. Add your ElevenLabs API key
cp .env.example .env
$EDITOR .env # ELEVENLABS_API_KEY=...
How it works: The LLM never watches the video. It reads it — through two layers that together give it everything it needs to cut with word-boundary precision. 工作原理: LLM 从不“观看”视频。它是通过两个层级来“阅读”视频,这两层结合在一起,为它提供了以单词边界精度进行剪辑所需的一切信息。
Layer 1 — Audio transcript (always loaded). One ElevenLabs Scribe call per source gives word-level timestamps, speaker diarization, and audio events ((laughter), (applause), (sigh)). All takes pack into a single ~12KB takes_packed.md — the LLM’s primary reading view. 第一层——音频转录(始终加载)。每个源文件调用一次 ElevenLabs Scribe,即可获得单词级的精确时间戳、说话人识别以及音频事件(如(笑声)、(掌声)、(叹气))。所有素材被打包成一个约 12KB 的 takes_packed.md 文件——这是 LLM 的主要阅读视图。
Layer 2 — Visual composite (on demand). timeline_view produces a filmstrip + waveform + word labels PNG for any time range. Called only at decision points — ambiguous pauses, retake comparisons, cut-point sanity checks. Naive approach: 30,000 frames × 1,500 tokens = 45M tokens of noise. Video Use: 12KB text + a handful of PNGs. Same idea as browser-use giving an LLM a structured DOM instead of a screenshot — but for video. 第二层——视觉合成(按需调用)。timeline_view 会为任何时间范围生成一张包含胶片条、波形图和单词标签的 PNG 图片。仅在决策点调用——例如模糊的停顿、重拍对比、剪辑点合理性检查。简单粗暴的方法是:30,000 帧 × 1,500 tokens = 4500 万 tokens 的噪音。而 Video Use 的方案是:12KB 文本 + 少量 PNG 图片。这与 browser-use 为 LLM 提供结构化 DOM 而非截图的理念相同——只不过是针对视频。
Pipeline: Transcribe ──> Pack ──> LLM Reasons ──> EDL ──> Render ──> Self-Eval │ └─ issue? fix + re-render (max 3) 流水线: 转录 ──> 打包 ──> LLM 推理 ──> EDL(编辑决策列表) ──> 渲染 ──> 自我评估 │ └─ 有问题?修复 + 重新渲染(最多 3 次)
The self-eval loop runs timeline_view on the rendered output at every cut boundary — catches visual jumps, audio pops, hidden subtitles. You see the preview only after it passes. 自我评估循环会在每个剪辑边界对渲染输出运行 timeline_view——捕捉视觉跳变、音频爆音、隐藏的字幕等问题。只有通过评估后,你才能看到预览。
Design principles: Text + on-demand visuals. No frame-dumping. The transcript is the surface. Audio is primary, visuals follow. Cuts come from speech boundaries and silence gaps. Ask → confirm → execute → self-eval → persist. Never touch the cut without strategy approval. Zero assumptions about content type. Look, ask, then edit. 12 hard rules, artistic freedom elsewhere. Production-correctness is non-negotiable. Taste isn’t. See SKILL.md for the full production rules and editing craft. 设计原则: 文本 + 按需视觉。不进行帧转储。转录是核心界面。音频优先,视觉跟随。剪辑基于语音边界和静音间隙。询问 → 确认 → 执行 → 自我评估 → 持久化。未经策略批准,绝不进行剪辑。不对内容类型做任何假设。先看、先问,再编辑。12 条硬性规则,其余部分保留艺术自由。制作的准确性不可妥协,但品味可以。详见 SKILL.md 以获取完整的制作规则和剪辑技巧。