How to Use Transformers.js in a Chrome Extension

如何在 Chrome 扩展程序中使用 Transformers.js

We recently released a Transformers.js demo browser extension powered by Gemma 4 E2B to help users navigate the web. While building it, we ran into several practical observations about Manifest V3 runtimes, model loading, and messaging that are worth sharing.

我们最近发布了一个由 Gemma 4 E2B 驱动的 Transformers.js 演示浏览器扩展，旨在帮助用户浏览网页。在构建过程中，我们针对 Manifest V3 运行时、模型加载和消息传递得出了一些值得分享的实践经验。

Who this is for

目标读者

This guide is for developers who want to run local AI features in a Chrome extension with Transformers.js under Manifest V3 constraints. By the end, you will have the same architecture used in this project: a background service worker that hosts models, a side panel chat UI, and a content script for page-level actions.

本指南面向希望在 Manifest V3 约束下，通过 Transformers.js 在 Chrome 扩展中运行本地 AI 功能的开发者。阅读完本指南，你将掌握本项目所使用的架构：一个托管模型的后台 Service Worker、一个侧边栏聊天界面，以及一个用于页面级操作的内容脚本（Content Script）。

What we will build

我们将构建什么

In this guide, we will recreate the core architecture of Transformers.js Gemma 4 Browser Assistant, using the published extension as a reference and the open-source codebase as the implementation map.

在本指南中，我们将重现 Transformers.js Gemma 4 浏览器助手的核心架构，并以已发布的扩展程序作为参考，以开源代码库作为实现蓝图。

Live extension: Chrome Web Store
Source code: github.com/nico-martin/gemma4-browser-extension
End result: a background-hosted Transformers.js engine, a side panel chat UI, and a content script for page extraction and highlighting.
在线扩展： Chrome Web Store
源代码： github.com/nico-martin/gemma4-browser-extension
最终成果： 一个后台托管的 Transformers.js 引擎、一个侧边栏聊天 UI，以及一个用于页面提取和高亮显示的内容脚本。

1) Chrome extension architecture (MV3)

1) Chrome 扩展架构 (MV3)

Before diving in, a quick scope note: I will not go deep on the React UI layer or Vite build configuration. The focus here is the high-level architecture decisions: what runs in each Chrome runtime and how those pieces are orchestrated. If Manifest V3 is new to you, read this short overview first: What is Manifest V3?.

在深入探讨之前，先说明一下范围：我不会深入讲解 React UI 层或 Vite 构建配置。这里的重点是高层架构决策：每个 Chrome 运行时中运行什么，以及这些部分如何协同工作。如果你对 Manifest V3 还很陌生，请先阅读此简要概述：什么是 Manifest V3？。

1.1 Runtime contexts and entry points

1.1 运行时上下文与入口点

In MV3, your architecture starts in public/manifest.json. This project defines three entry points:

background.service_worker = background.js, built from src/background/background.ts.
side_panel.default_path = sidebar.html, built from src/sidebar/index.html.
content_scripts[].js = content.js with matches: http(s)://*/* and run_at: document_idle, built from src/content/content.ts.

在 MV3 中，架构始于 public/manifest.json。本项目定义了三个入口点：

background.service_worker = background.js，由 src/background/background.ts 构建。
side_panel.default_path = sidebar.html，由 src/sidebar/index.html 构建。
content_scripts[].js = content.js，匹配规则为 http(s)://*/*，执行时机为 document_idle，由 src/content/content.ts 构建。

The background service worker also handles chrome.action.onClicked to open the side panel for the active tab. Related entry point to know: a popup can be defined with action.default_popup and works well for quick actions. This project uses a side panel for persistent chat, but the orchestration pattern is the same.

后台 Service Worker 还处理 chrome.action.onClicked 以打开当前标签页的侧边栏。需要了解的相关入口点是：可以通过 action.default_popup 定义弹出窗口，这非常适合快速操作。本项目使用侧边栏进行持久化聊天，但其编排模式是相同的。

1.2 What runs where

1.2 各部分运行位置

The key design decision is to keep heavy orchestration in the background and keep UI/page logic thin.

Background (src/background/background.ts) is the control plane: agent lifecycle, model initialization, tool execution, and shared services like feature extraction.
Side panel (src/sidebar/*) is the interaction layer: chat input/output, streaming updates, and setup controls.
Content script (src/content/content.ts) is the page bridge: DOM extraction and highlight actions.

关键的设计决策是将繁重的编排工作留在后台，并保持 UI/页面逻辑轻量化。

后台 (src/background/background.ts) 是控制平面：负责智能体生命周期、模型初始化、工具执行以及特征提取等共享服务。
侧边栏 (src/sidebar/*) 是交互层：负责聊天输入/输出、流式更新和设置控制。
内容脚本 (src/content/content.ts) 是页面桥梁：负责 DOM 提取和高亮操作。

One practical consequence of this division is that the conversation history also lives in background (Agent.chatMessages): the UI sends events like AGENT_GENERATE_TEXT, background appends the message, runs inference, then emits MESSAGES_UPDATE back to the side panel. This split avoids duplicate model loads, keeps the UI responsive, and respects Chrome’s security boundaries around DOM access.

这种划分的一个实际结果是，对话历史也保存在后台 (Agent.chatMessages) 中：UI 发送诸如 AGENT_GENERATE_TEXT 的事件，后台追加消息、运行推理，然后将 MESSAGES_UPDATE 发回侧边栏。这种分离避免了重复的模型加载，保持了 UI 的响应速度，并遵循了 Chrome 关于 DOM 访问的安全边界。

1.3 Messaging contract

1.3 消息传递契约

Once runtimes are separated, messaging becomes the backbone. In this project, all messages are typed through enums in src/shared/types.ts.

一旦运行时被分离，消息传递就成了骨干。在本项目中，所有消息都通过 src/shared/types.ts 中的枚举进行类型定义。

Side panel -> background (BackgroundTasks): CHECK_MODELS, INITIALIZE_MODELS, AGENT_INITIALIZE, AGENT_GENERATE_TEXT, AGENT_GET_MESSAGES, AGENT_CLEAR, EXTRACT_FEATURES
Background -> side panel (BackgroundMessages): DOWNLOAD_PROGRESS, MESSAGES_UPDATE
Background -> content (ContentTasks): EXTRACT_PAGE_DATA, HIGHLIGHT_ELEMENTS, CLEAR_HIGHLIGHTS
侧边栏 -> 后台 (BackgroundTasks): CHECK_MODELS, INITIALIZE_MODELS, AGENT_INITIALIZE, AGENT_GENERATE_TEXT, AGENT_GET_MESSAGES, AGENT_CLEAR, EXTRACT_FEATURES
后台 -> 侧边栏 (BackgroundMessages): DOWNLOAD_PROGRESS, MESSAGES_UPDATE
后台 -> 内容脚本 (ContentTasks): EXTRACT_PAGE_DATA, HIGHLIGHT_ELEMENTS, CLEAR_HIGHLIGHTS

The orchestration rule is simple: the background is the single coordinator; side panel and content script are specialized workers that request actions and render results. Typical request flow: Side panel sends AGENT_GENERATE_TEXT. Background appends to Agent.chatMessages and runs model/tool steps. Background emits MESSAGES_UPDATE. Side panel re-renders from the updated message list.

编排规则很简单：后台是唯一的协调者；侧边栏和内容脚本是专门的工作单元，负责请求操作并渲染结果。典型的请求流程：侧边栏发送 AGENT_GENERATE_TEXT。后台将其追加到 Agent.chatMessages 并运行模型/工具步骤。后台发出 MESSAGES_UPDATE。侧边栏根据更新后的消息列表重新渲染。

2) Transformers.js integration details

2) Transformers.js 集成细节

2.1 Models and responsibilities

2.1 模型与职责

In src/shared/constants.ts, this extension uses two model roles:

TextGeneration / LLM: onnx-community/gemma-4-E2B-it-ONNX (text-generation, q4f16)
VectorEmbeddings: onnx-community/all-MiniLM-L6-v2-ONNX (feature-extraction, fp32)

在 src/shared/constants.ts 中，该扩展使用了两种模型角色：

文本生成 / LLM: onnx-community/gemma-4-E2B-it-ONNX (text-generation, q4f16)
向量嵌入: onnx-community/all-MiniLM-L6-v2-ONNX (feature-extraction, fp32)

The split is intentional: Gemma 4 handles reasoning/tool decisions, while MiniLM generates vector embeddings for the semantic similarity search in ask_website and find_history.

这种拆分是有意为之的：Gemma 4 处理推理/工具决策，而 MiniLM 则为 ask_website 和 find_history 中的语义相似度搜索生成向量嵌入。

2.2 Where inference runs

2.2 推理运行位置

All inference runs in background (src/background/background.ts):

text generation via pipeline("text-generation", ...) with consistent KV Caching enabled by our new DynamicCache class.
embeddings via pipeline("feature-extraction", ...) plus vector normalization.

所有推理都在后台 (src/background/background.ts) 运行：

通过 pipeline("text-generation", ...) 进行文本生成，并使用我们新的 DynamicCache 类启用一致的 KV 缓存。
通过 pipeline("feature-extraction", ...) 进行嵌入，并进行向量归一化。

This gives a single model host for all tabs/sessions, avoids duplicate memory usage, and keeps the side panel UI responsive. Because models are loaded from the background service worker, artifacts are cached under the extension origin (chrome-extension://<extension-id>) rather than per-website origins, which gives one shared cache for the whole extension install.

这为所有标签页/会话提供了一个单一的模型宿主，避免了重复的内存占用，并保持了侧边栏 UI 的响应速度。由于模型是从后台 Service Worker 加载的，工件被缓存在扩展源 (chrome-extension://<extension-id>) 下，而不是每个网站的源下，这为整个扩展安装提供了一个共享缓存。

MV3 lifecycle note: service workers can be suspended and restarted, so model runtime state should be treated as recoverable and re-initialized when needed.

MV3 生命周期说明：Service Worker 可能会被挂起并重启，因此模型运行时状态应被视为可恢复的，并在需要时重新初始化。

2.3 Download and cache lifecycle

2.3 下载与缓存生命周期

The model lifecycle is explicit:

CHECK_MODELS inspects what is already cached and estimates remaining download size.
INITIALIZE_MODELS downloads/initializes models and emits DOWNLOAD_PROGRESS to the UI.

模型生命周期是明确的：

CHECK_MODELS 检查已缓存的内容并估算剩余下载大小。
INITIALIZE_MODELS 下载/初始化模型并向 UI 发出 DOWNLOAD_PROGRESS。

Long-lived instances are reused after setup:

generation pipeline in src/background/agent/Agent.ts
embedding pipeline in src/background/utils/FeatureExtractor.ts

长生命周期实例在设置后会被复用：

src/background/agent/Agent.ts 中的生成流水线
src/background/utils/FeatureExtractor.ts 中的嵌入流水线

Permissions and privacy are part of the architecture, not a checkbox at the end. In this project, public/manifest.json asks for sidePanel, storage, scripting, and tabs, plus host_permissions for http(s)://*/*.

权限和隐私是架构的一部分，而不是最后才勾选的选项。在本项目中，public/manifest.json 请求了 sidePanel、storage、scripting 和 tabs 权限，以及针对 http(s)://*/* 的 host_permissions。