Skim: Speculative Execution for Fast and Efficient Web Agents

Skim：用于快速高效 Web 智能体的推测执行框架

Abstract: Skim is a speculative execution framework for web agents that exploits the predictable structure of purpose-built websites. Today’s web-agent expense is not intrinsic to the tasks but a property of how agents are composed: frontier-model inference, browser rendering, and ReAct-style planning are applied to every step of every task regardless of complexity.

摘要： Skim 是一种用于 Web 智能体的推测执行框架，它利用了专用网站的可预测结构。当前 Web 智能体的高昂成本并非任务本身固有的，而是由智能体的构建方式所致：无论任务复杂程度如何，每一项任务的每一步都会调用前沿模型推理、浏览器渲染和 ReAct 风格的规划。

Skim’s key observation is that websites enforce stable URL patterns, answer formats, and task-to-trajectory mappings across queries of the same type, so most queries can bypass these heavyweight components entirely. An offline profiler captures these patterns once per site. At runtime, Skim matches each query to a template, synthesizes the destination URL, and extracts the answer with a small model.

Skim 的核心观察在于，网站在处理同类查询时会遵循稳定的 URL 模式、答案格式以及任务到轨迹的映射，因此大多数查询完全可以绕过这些繁重的组件。离线分析器会为每个站点捕获一次这些模式。在运行时，Skim 将每个查询匹配到一个模板，合成目标 URL，并使用小型模型提取答案。

A lightweight verifier gates each fast-path output against the query and schema; rare misspeculations cascade to the full agent, warm-started by the fast path’s final URL to preserve upstream trajectory progress. Across standard web-agent benchmarks paired with three backbone agents (WebVoyager, AgentOccam, BrowserUse), Skim reduces median per-task cost by 1.9x and latency by 33.4% with no accuracy loss.

轻量级验证器会根据查询和模式对每个快速路径的输出进行把关；罕见的推测错误会回退到完整智能体，并利用快速路径的最终 URL 进行热启动，以保留上游轨迹的进度。在结合了三种骨干智能体（WebVoyager、AgentOccam、BrowserUse）的标准 Web 智能体基准测试中，Skim 在不损失准确性的前提下，将每项任务的中位数成本降低了 1.9 倍，延迟降低了 33.4%。