Show HN: Kage – Shadow any website to a single binary for offline viewing
Show HN: Kage – Shadow any website to a single binary for offline viewing
Kage (影, “shadow”) clones a website into a folder you can browse offline, with every script stripped out. It opens each page in real headless Chrome, waits for the page to settle, snapshots the DOM a human would have seen, then deletes all the JavaScript and pulls the CSS, images, and fonts down to local paths. What lands on disk looks like the live site and runs no code.
Kage(影)可以将网站克隆到一个文件夹中,供你离线浏览,并剔除掉所有的脚本。它会在真实的无头 Chrome 浏览器中打开每个页面,等待页面加载完成,截取人类可见的 DOM 快照,然后删除所有 JavaScript,并将 CSS、图片和字体下载到本地路径。最终保存在磁盘上的内容看起来与实时网站无异,且不会运行任何代码。
Install • Quick start • Commands • Clone • Pack • Native window • How it works
安装 • 快速入门 • 命令 • 克隆 • 打包 • 原生窗口 • 工作原理
You already know the problem. You hit “Save As” on a page you want to keep, and six months later you open it to find a blank screen, a spinner that never stops, or a copy that still tries to phone home to an analytics server that no longer exists. The page was never really yours. It was a thin client for someone else’s JavaScript.
你一定遇到过这个问题:当你对想要保存的页面点击“另存为”后,六个月后再打开它,却发现是一片空白、转个不停的加载图标,或者是一个仍在尝试连接早已不存在的分析服务器的副本。这个页面从来都不真正属于你,它只是别人 JavaScript 的一个瘦客户端。
kage takes the other road. It drives a real browser, lets the page finish doing whatever it does, grabs the finished result, and then rips every script out of it. No tracking, no network calls, no surprises. Just .html files you can open straight off disk, hand to a friend, or pack into a single file and forget about for a decade.
Kage 选择了另一条路。它驱动真实的浏览器,让页面完成所有加载动作,抓取最终结果,然后剔除掉所有的脚本。没有追踪,没有网络请求,没有意外。只有你可以直接从磁盘打开、分享给朋友,或者打包成单个文件并存放十年之久的 .html 文件。
Full docs and guides live at kage.tamnd.com.
完整文档和指南请访问 kage.tamnd.com。
Install
安装
go install github.com/tamnd/kage/cmd/kage@latest
Prefer a prebuilt binary? Grab an archive, a .deb/.rpm/.apk, or a checksum from releases. Or skip installing Chrome yourself and use the container image, which bundles Chromium:
想要预编译的二进制文件?请从 Releases 页面获取压缩包、.deb/.rpm/.apk 或校验和。或者,你也可以跳过安装 Chrome 的步骤,直接使用内置了 Chromium 的容器镜像:
docker run —rm -v “$PWD/out:/out” ghcr.io/tamnd/kage clone paulgraham.com
kage drives a real browser, so it needs Chrome or Chromium on the host. It finds a system install on its own; point it somewhere specific with —chrome or the KAGE_CHROME environment variable. The container needs nothing extra. Shell completion ships in the box: kage completion bash|zsh|fish|powershell.
Kage 驱动真实的浏览器,因此宿主机需要安装 Chrome 或 Chromium。它会自动查找系统安装路径;你也可以通过 --chrome 参数或 KAGE_CHROME 环境变量指定特定路径。容器版本则无需额外配置。内置支持 Shell 自动补全:kage completion bash|zsh|fish|powershell。
Quick start
快速入门
Let’s mirror Paul Graham’s essays so you can read them on a plane, on a laptop with no wifi, or in the year 2050 after the site has finally changed its design:
让我们镜像 Paul Graham 的文章,这样你就可以在飞机上、没有 Wi-Fi 的笔记本电脑上,或者在 2050 年网站设计彻底改变后阅读它们:
# 1. Clone the site into $HOME/data/kage/paulgraham.com/
kage clone paulgraham.com
# 2. Read it back offline in your browser
kage serve $HOME/data/kage/paulgraham.com
# open http://127.0.0.1:8800
That’s the whole loop. Every essay, every image, every stylesheet, frozen on your disk and runnable with zero network. The next two steps are optional but nice: collapse the whole thing into one file, and pop it open in its own window.
这就是整个流程。每一篇文章、每一张图片、每一个样式表都被冻结在你的磁盘上,无需联网即可运行。接下来的两个步骤是可选的,但非常实用:将整个内容压缩成一个文件,并在独立的窗口中打开。
# 3. Squeeze the mirror into a single shareable file
kage pack paulgraham.com
# -> paulgraham.com.zim
kage open paulgraham.com.zim
# 4. Or into one executable that *is* the site
kage pack paulgraham.com --format binary -o paulgraham
./paulgraham # serves itself, needs nothing installed
Commands
命令
| Command | What it does |
|---|---|
kage clone <url> | render a site in headless Chrome and write a browsable, script-free mirror |
kage serve [dir] | preview a cloned folder over a local HTTP server |
kage pack <mirror-dir> | collapse a mirror into one ZIM archive, or a self-contained viewer binary |
kage open <file.zim> | serve a packed ZIM back for offline reading |
| 命令 | 功能 |
|---|---|
kage clone <url> | 在无头 Chrome 中渲染网站,并生成可浏览、无脚本的镜像 |
kage serve [dir] | 通过本地 HTTP 服务器预览已克隆的文件夹 |
kage pack <mirror-dir> | 将镜像压缩为 ZIM 归档文件或自包含的查看器二进制文件 |
kage open <file.zim> | 为离线阅读提供已打包的 ZIM 文件服务 |
Clone
克隆
# The whole site, into $HOME/data/kage/<host>/
kage clone https://paulgraham.com
# Just the first 50 pages, two links deep, for a quick taste
kage clone paulgraham.com --max-pages 50 --max-depth 2
# Only one section of a bigger site
kage clone go.dev --scope-prefix /doc
# Pull in subdomains too, and scroll each page to trip lazy-loaded images
kage clone example.com --subdomains --scroll
# Come back next month and re-render in place to catch new essays
kage clone paulgraham.com --refresh
A clone is a polite, breadth-first crawl. It reads robots.txt, seeds itself from sitemap.xml, and stays on the seed host unless you tell it otherwise. It is also stubbornly idempotent: each page is keyed by the file it writes, so the same essay reached over http and https, with or without a trailing slash, gets fetched exactly once. Hit Ctrl-C and it saves its place on the way out; run it again and it picks up where it stopped. —refresh re-renders in place, —force wipes the host and starts clean.
克隆是一个礼貌的、广度优先的爬取过程。它会读取 robots.txt,从 sitemap.xml 获取种子,并且除非你另有指定,否则它会保持在种子主机内。它还具有极强的幂等性:每个页面都以其写入的文件为键,因此通过 http 和 https 访问的同一篇文章,无论是否有末尾斜杠,都只会被抓取一次。按下 Ctrl-C,它会在退出时保存进度;再次运行,它会从上次停止的地方继续。--refresh 会原地重新渲染,--force 则会清除该主机的所有内容并重新开始。
Serve
服务
kage serve runs a tiny static file server over a cloned folder so links and assets resolve the way they would on a real host:
kage serve 会在克隆的文件夹上运行一个轻量级的静态文件服务器,以便链接和资源能够像在真实主机上一样正确解析:
kage serve $HOME/data/kage/paulgraham.com
# open http://127.0.0.1:8800
Pack it into one file
打包成单个文件
A mirror is a folder, which is great for browsing and lousy for moving around. Copying thousands of little files is slow, and “here, have this directory” is a clumsy thing to hand someone. kage pack collapses the whole mirror into one artifact, and you choose the shape: an open ZIM archive, or a single executable that is the site.
镜像是一个文件夹,这非常适合浏览,但不利于移动。复制成千上万个小文件很慢,而且“给你这个目录”也不是一种方便的分享方式。kage pack 将整个镜像压缩成一个制品,你可以选择格式:开放的 ZIM 归档文件,或者一个本身就是网站的单一可执行文件。
A single ZIM file
单个 ZIM 文件
kage pack paulgraham.com # -> paulgraham.com.zim
kage open paulgraham.com.zim
ZIM is an open file format built for exactly this: a whole website (or a whole Wikipedia) squeezed into one compressed, indexed, read-only file. kage writes the entire mirror into it, text zstd-compressed and media stored as-is. It is the format behind Kiwix, the offline-content project people use to carry Wikipedia, Stack Overflow, and Project Gutenberg onto boats, into classrooms with no internet, and onto a phone for a long flight. Because the format is a documented standard and not a kage invention, a paulgraham.com.zim you make today will st
ZIM 是一种专为此设计的开放文件格式:将整个网站(或整个维基百科)压缩进一个带索引的只读文件中。Kage 将整个镜像写入其中,文本经过 zstd 压缩,媒体文件则按原样存储。这是 Kiwix 项目背后的格式,人们使用 Kiwix 将维基百科、Stack Overflow 和古腾堡计划带到船上、没有互联网的教室里,以及长途飞行时的手机中。由于该格式是一种有据可查的标准,而非 Kage 的发明,因此你今天制作的 paulgraham.com.zim 将会……(原文截断)