How I built a YouTube performance classifier that adjusts tomorrow's video script bias
How I built a YouTube performance classifier that adjusts tomorrow’s video script bias
我是如何构建一个 YouTube 表现分类器来调整明日视频脚本偏好的
I’ve been running an automated YouTube channel alongside three programmatic directory sites since April. The video side uses a two-host VTuber pipeline that generates daily scripts and renders them overnight. What I didn’t have until last week was any feedback mechanism — the script generator just produced content in a vacuum, with no idea which videos were actually landing. The fix is scripts/yt-analytics/run.py, a 330-line Python script that runs daily, reads the last 30 videos via the YouTube Data API v3, classifies them as high or low performers, and writes bias hints back to docs/yt-knowhow-bank-en.md — the same file the script generator reads before each session. This is a closed loop, not magic. But closing the loop is the entire point.
自四月以来,我一直在运营一个自动化 YouTube 频道,同时还维护着三个程序化目录网站。视频部分使用了一个双主播 VTuber 流水线,每天生成脚本并连夜渲染。直到上周,我一直缺乏反馈机制——脚本生成器只是在真空中生产内容,完全不知道哪些视频真正获得了成功。解决方案是 scripts/yt-analytics/run.py,这是一个 330 行的 Python 脚本,每天运行一次,通过 YouTube Data API v3 读取最近 30 个视频,将它们分类为高表现或低表现,并将偏好提示写回 docs/yt-knowhow-bank-en.md——这正是脚本生成器在每次会话前读取的文件。这是一个闭环,不是魔法。但闭环本身就是核心所在。
Fetching the Channel Without a Stable Channel ID
在没有稳定频道 ID 的情况下获取频道信息
The first problem was channel resolution. YouTube’s v3 API takes a channel ID in most endpoints, but I didn’t want to hardcode an ID that might break if the channel was ever recreated. The script tries four strategies in order: forHandle with the value of a YT_CHANNEL_HANDLE environment variable (claudeautomate, claude_automate, claude-automate). If all fail: a search API call for “claude automate” with a loop over returned channel IDs.
第一个问题是频道解析。YouTube 的 v3 API 在大多数端点中都需要频道 ID,但我不想硬编码一个万一频道重建就会失效的 ID。该脚本按顺序尝试四种策略:使用 YT_CHANNEL_HANDLE 环境变量的值(claudeautomate、claude_automate、claude-automate)进行 forHandle 查询。如果全部失败,则调用搜索 API 搜索 “claude automate”,并遍历返回的频道 ID。
for handle in handles:
body = http_get(f"...channels?part=contentDetails,statistics,snippet&forHandle={handle}&key={api_key}")
items = body.get("items") or []
if items: return items[0]
The search fallback is slower and burns more quota but fires only when every direct handle attempt fails. In practice, claudeautomate matches on the first try. Once the channel resolves, relatedPlaylists.uploads gives the uploads playlist ID. From there, playlistItems returns up to 30 recent videos with their IDs, which feeds a second videos.list request for statistics.
搜索回退机制速度较慢且消耗更多配额,但仅在所有直接句柄尝试失败时才会触发。实际上,claudeautomate 在第一次尝试时就能匹配成功。一旦频道解析完成,relatedPlaylists.uploads 就会提供上传播放列表的 ID。由此,playlistItems 会返回最多 30 个近期视频及其 ID,这些 ID 将作为第二次 videos.list 请求的输入以获取统计数据。
Classifying Videos as High or Low
将视频分类为高表现或低表现
The classifier is deliberately simple: median-based thresholds, no machine learning. 该分类器设计得非常简单:基于中位数的阈值,不涉及机器学习。
views = [int(v["statistics"].get("viewCount", 0)) for v in videos]
median = statistics.median(views)
for v, view in zip(videos, views):
published = datetime.fromisoformat(v["snippet"]["publishedAt"].replace("Z", "+00:00"))
age_h = (now - published).total_seconds() / 3600
if view >= median * 1.5: high.append(v)
elif view <= median * 0.6 and age_h >= 72: low.append(v)
Videos above 1.5× median views are HIGH. Videos below 0.6× median — but only if they’re more than 72 hours old — are LOW. The 72-hour grace period matters: a video posted yesterday with 40% of median views might just be young. Flagging it as a dud immediately would be noise. Everything between 0.6× and 1.5× is neither — not actionable signal, so I ignore it. The choice of median over mean is deliberate. If one video goes viral, the mean view count distorts every other video’s classification. Median is resistant to outliers. This is a lesson I learned from the three-tier content quality approach on the directory side: simple bucketing beats trying to optimize a single number. 观看量超过中位数 1.5 倍的视频为“高表现”。观看量低于中位数 0.6 倍的视频——且发布时间超过 72 小时——为“低表现”。72 小时的宽限期很重要:昨天发布的视频如果只有中位数的 40%,可能只是因为发布时间太短。立即将其标记为失败作品会产生噪音。介于 0.6 倍和 1.5 倍之间的视频既不属于高也不属于低——没有可操作的信号,所以我忽略它们。选择中位数而非平均值是经过深思熟虑的。如果一个视频爆火,平均观看量会扭曲其他所有视频的分类。中位数对异常值具有抵抗力。这是我从目录网站的三级内容质量方法中学到的经验:简单的分桶比试图优化单一数字更有效。
Matching Archetypes via Title Overlap
通过标题重叠匹配原型
The script generator assigns each produced video an archetype label — “tutorial”, “recap”, “comparison”, “technical” — and saves it in the uploaded queue. But YouTube’s analytics API doesn’t expose those labels. I need to reconnect the archetype to the performance stats. The reconnection happens via title overlap. 脚本生成器为每个生成的视频分配一个原型标签——“教程”、“回顾”、“对比”、“技术”——并将其保存在上传队列中。但 YouTube 的分析 API 不会暴露这些标签。我需要将原型与表现统计数据重新关联起来。这种重新关联是通过标题重叠来实现的。
def title_overlap(a: str, b: str) -> int:
aw = {w.lower().strip(",.!?:;\"'") for w in a.split() if len(w) > 2}
bw = {w.lower().strip(",.!?:;\"'") for w in b.split() if len(w) > 2}
return len(aw & bw)
For each video in the API response, I compare its title against every uploaded queue file and take the best match — but only if word overlap is ≥4. Titles with fewer than 4 matching significant words get labeled “unknown.” This is imperfect. Titles drift during publishing. But a ≥4-word match is strict enough that false positives are rare. In testing on a 25-video set, 21 matched correctly, 4 came back as “unknown.” Not great, not unusable — good enough for aggregate pattern analysis. 对于 API 响应中的每个视频,我将其标题与每个上传队列文件进行比较,并取最佳匹配——但前提是单词重叠数 ≥4。匹配的重要单词少于 4 个的标题会被标记为“未知”。这并不完美,标题在发布过程中会发生变化。但 ≥4 个单词的匹配足够严格,误报率很低。在 25 个视频的测试集中,21 个匹配正确,4 个返回“未知”。虽然不算完美,但也并非不可用——对于聚合模式分析来说已经足够了。
Inferring Hook Patterns from the First Word
从第一个单词推断钩子模式
Beyond archetype, I wanted to know whether certain opening patterns in video scripts correlated with performance. The hook pattern inference is a single-function lookup on the first word of the script’s opening line. 除了原型之外,我还想知道视频脚本中某些开场模式是否与表现相关。钩子模式推断是通过脚本开场白第一个单词的单函数查找来实现的。
def hook_pattern(text: str) -> str:
first_word = text.strip().lower().split()[0]
if first_word in {"why", "how", "what", "when", "who"}: return "question"
if first_word in {"three", "four", "five"} or any(c.isdigit() for c in first_word): return "numeric"
# ... (other patterns)
It’s a blunt heuristic. “How” and “Why” as first words don’t automatically make a video good. But at scale — 30 videos classified per run — the distribution across HIGH and LOW buckets produces meaningful signal. If “question” hooks consistently cluster in LOW and “numeric” hooks cluster in HIGH, that’s worth feeding back into the script generator’s prompt context. This is also the part I’d replace first if I were scaling this beyond 50 videos. First-word classification misses everything after the opener. I’ll eventually pass the full opening sentence through a small LLM call for categorization. 这是一种粗糙的启发式方法。“How”和“Why”作为第一个单词并不能自动让视频变得优秀。但在大规模数据下——每次运行分类 30 个视频——在高表现和低表现桶中的分布会产生有意义的信号。如果“问题”类钩子持续聚集在低表现桶中,而“数字”类钩子聚集在高表现桶中,那么将其反馈到脚本生成器的提示词上下文中就很有价值。如果我要将此规模扩展到 50 个视频以上,这也是我首先会替换的部分。仅靠第一个单词分类会漏掉开场白之后的所有内容。最终,我会将完整的开场句子通过小型 LLM 调用进行分类。
Writing Bias Hints Back to the Knowledge Bank
将偏好提示写回知识库
The output of the classifier isn’t a dashboard — it’s a section in docs/yt-knowhow-bank-en.md that the script generator reads at the start of each session. The update_kb function finds the ## Routine Auto-Tuner Notes header and replaces everything up to the next ##.
分类器的输出不是仪表板,而是 docs/yt-knowhow-bank-en.md 中的一个部分,脚本生成器会在每次会话开始时读取它。update_kb 函数会找到 ## Routine Auto-Tuner Notes 标题,并替换掉直到下一个 ## 之前的所有内容。