CodeDNA: AI Codebase Archaeologist Built with Gemma 4 Thinking Mode

Gemma 4 Challenge: Build With Gemma 4 Submission

You inherited this codebase 6 months ago. You can feel something went wrong around 2021. Bug reports spiked. Velocity dropped. The original authors left. The commit history has 3,000 entries — and every answer is in there. Nobody has time to read 3,000 commits. CodeDNA does.

Gemma 4 挑战赛：Build With Gemma 4 参赛作品

你六个月前接手了这个代码库。你能感觉到 2021 年左右出了一些问题：错误报告激增，开发速度下降，原作者也离开了。提交历史里有 3,000 条记录——所有的答案都在里面。但没人有时间去阅读 3,000 次提交。CodeDNA 可以。

What I Built

CodeDNA is an AI Codebase Archaeologist. You paste your git log, and Gemma 4 — using Thinking Mode — reconstructs the story of your codebase: bug storms, architectural pivots, refactor eras, feature bursts, and an overall health score with a transparent breakdown. The output is 100% verifiable. You can check every milestone against your actual commit history. No hallucinated CVEs, no unverifiable financial claims — just pattern-extracted facts from structured text you already own.

我构建了什么

CodeDNA 是一款 AI 代码库“考古学家”。你只需粘贴 git 日志，Gemma 4 就会利用“思考模式”（Thinking Mode）重构你代码库的发展历程：包括错误风暴、架构转型、重构时期、功能爆发期，以及带有透明拆解的整体健康评分。输出结果 100% 可验证。你可以对照实际的提交历史核对每一个里程碑。没有虚构的 CVE，没有无法验证的财务声明——只有从你现有的结构化文本中提取出的事实。

The Problem It Solves

You inherit a codebase. Something went wrong around late 2021 — you can feel it. Bug reports spiked, velocity dropped, the original authors left. The commit history has everything, but nobody has time to read 3,000 commits manually. Traditional tools give you graphs of commit frequency. That tells you how much happened, not what happened or why one period was chaotic and another stable. CodeDNA uses Gemma 4’s Thinking Mode to reason across your entire commit history and surface the narrative that was always there.

它解决了什么问题

当你接手一个代码库时，你能感觉到 2021 年底左右出了问题：错误报告激增，开发速度下降，原作者离职。提交历史记录了一切，但没人有时间手动阅读 3,000 条提交。传统工具只能提供提交频率图表，这只能告诉你发生了多少事，却无法解释发生了什么，或者为什么某个时期混乱而另一个时期稳定。CodeDNA 利用 Gemma 4 的“思考模式”对整个提交历史进行推理，挖掘出一直隐藏在其中的叙事逻辑。

Core Features

Animated timeline: Color-coded milestones — red = bug storm, yellow = refactor, green = pivot, blue = feature burst.
Health score + breakdown: 0–100 score with transparent factor table (not a black-box number).
Live Thinking Mode stream: Watch Gemma 4 reason step-by-step as it analyzes your history.
Smart preprocessing: Caps at 180 commits, extracts monthly histograms and file hotspots before inference.
Analysis caching: Same git log = instant results on repeat runs.
Markdown export: Download a complete archaeological report.

核心功能

动画时间轴： 颜色编码的里程碑——红色代表错误风暴，黄色代表重构，绿色代表架构转型，蓝色代表功能爆发。
健康评分与拆解： 0-100 分的评分，附带透明的因子表格（而非黑盒数字）。
实时思考模式流： 观察 Gemma 4 在分析历史时一步步的推理过程。
智能预处理： 限制在 180 次提交以内，在推理前提取月度直方图和文件热点。
分析缓存： 相同的 git 日志在重复运行时可获得即时结果。
Markdown 导出： 下载完整的考古报告。

Why Gemma 4 — Not “Just Any LLM”

Thinking Mode for causal chain reasoning: Standard completion models count keywords. Gemma 4’s Thinking Mode traces why patterns emerged. When it sees 14 “fix” commits targeting ReactFiberHooks.js in a 3-week window, it connects them causally.
128K context: 180 commits × ~200 tokens each = ~36K tokens of compressed history in one request. No chunking, no context loss.
Structured output: The JSON schema is strict (Pydantic v2 validated). If Gemma returns valid JSON, the timeline renders.
Privacy-first: Git history contains proprietary code; the design ensures local processing and secure handling.

为什么选择 Gemma 4，而不是“随便哪个大模型”

用于因果链推理的“思考模式”： 标准的补全模型只会统计关键词，而 Gemma 4 的“思考模式”能追踪模式出现的原因。当它在 3 周内看到 14 次针对 ReactFiberHooks.js 的“修复”提交时，它能建立因果联系。
128K 上下文： 180 次提交 × 约 200 token = 单次请求约 36K token 的压缩历史。无需分块，没有上下文丢失。
结构化输出： JSON 模式非常严格（经 Pydantic v2 验证）。只要 Gemma 返回有效的 JSON，时间轴就能渲染。
隐私优先： Git 历史包含专有代码；该设计确保了本地处理和安全处理。