SilverTorch: Index as Model — A New Retrieval Paradigm for Recommendation Systems
SilverTorch: Index as Model — A New Retrieval Paradigm for Recommendation Systems
SilverTorch:Index as Model —— 推荐系统的一种全新检索范式
By Lei Chen, Yiyi Pan, Ivy Sun, Sha Meng, Cornelia Carapcea, Shilin Ding, Ram Ramanathan, Nipun Mathur, Hong Yan, Lars Backstrom 作者:Lei Chen, Yiyi Pan, Ivy Sun, Sha Meng, Cornelia Carapcea, Shilin Ding, Ram Ramanathan, Nipun Mathur, Hong Yan, Lars Backstrom
We’re introducing SilverTorch, a reimagining of recommendation systems that unifies all retrieval components for user generated content under a unified architecture. SilverTorch shows up to 23.7x higher throughput compared to the state-of-the-art approaches. It’s also showing 20.9x more compute cost efficiency compared to a CPU-based solution while also improving accuracy. Our research paper, “SilverTorch: A Unified Model-based System to Democratize Large-Scale Recommendation on GPUs,” accepted to the full paper track at SIGIR 2026, contains full technical details. 我们隆重推出 SilverTorch,这是一种对推荐系统的重构,它将用户生成内容的所有检索组件统一在一个架构之下。与当前最先进的方法相比,SilverTorch 的吞吐量最高提升了 23.7 倍。在提高准确性的同时,其计算成本效率也比基于 CPU 的解决方案提升了 20.9 倍。我们的研究论文《SilverTorch: A Unified Model-based System to Democratize Large-Scale Recommendation on GPUs》已被 SIGIR 2026 全文录用,其中包含了完整的技术细节。
The retrieval system within industry recommendation systems have consisted of microservices stitched together, with neural networks inconsistently integrated. Our recommendation can scale to serve people across multiple platforms. Retrieval is responsible for narrowing from millions of pieces of content (e.g., reels and photos) down to thousands before passing them to ranking systems, all in less than 100 milliseconds. However, the microservice based design had hard constraints on model complexity and the number of candidates evaluated, ultimately creating a ceiling on the quality of recommendations that people on our platforms see. To break through this ceiling, we’ve fully reimagined our retrieval ecosystem into a unified model-based system – SilverTorch. 工业级推荐系统中的检索系统通常由多个微服务拼接而成,且神经网络的集成往往不够连贯。我们的推荐系统可以扩展以服务于多个平台的用户。检索系统的职责是在 100 毫秒内,将数百万条内容(如短视频和照片)筛选至数千条,然后再传递给排序系统。然而,基于微服务的架构在模型复杂度和候选集评估数量上存在严格限制,这最终限制了用户在平台上所见推荐内容的质量。为了突破这一瓶颈,我们彻底重构了检索生态系统,打造了一个统一的、基于模型的系统——SilverTorch。
SilverTorch operates under a new paradigm we call Index as Model. We’ve built our retrieval system as a single neural network and now express different microservices as model modules within this integrated neural network. Under Index as Model previous microservice-based item indices used for retrieval become a tensor inside the model. As a user opens up their app, one request flows through a SilverTorch model, completes all critical retrieval functions (searching for items similar to the user’s interests, filtering for eligibility, reranking and scoring engagement likelihood against multiple user engagement actions), and returns a list of high-quality content candidates to ranking. This new design effectively allows us to increase modeling complexity and the number of candidates evaluated without breaking the sub-100 milliseconds bar. SilverTorch 在一种我们称之为“Index as Model”(索引即模型)的新范式下运行。我们将检索系统构建为一个单一的神经网络,并将原本的微服务表达为该集成神经网络内的模型模块。在“Index as Model”范式下,以往用于检索的基于微服务的物品索引,现在变成了模型内部的一个张量。当用户打开应用时,请求会流经 SilverTorch 模型,完成所有关键的检索功能(搜索与用户兴趣相似的物品、过滤合规内容、重排序以及针对多种用户互动行为的参与可能性评分),并向排序系统返回一份高质量的内容候选列表。这种新设计使我们能够在不突破 100 毫秒延迟限制的前提下,有效增加模型复杂度和候选评估数量。
SilverTorch makes retrieval significantly more efficient, runs at scale, and enables better recommendations. Higher throughput, lower total cost of ownership (TCO). In an 80M-item end-to-end evaluation, SilverTorch served 23.7× more requests per second than a strong traditional multi-service baseline built on the same model architecture, while improving estimated TCO efficiency by 20.9×. Proven at scale. Results show SilverTorch can scale across a family of apps as the major retrieval system behind the feed and video content people see. Better recommendations. By making neural reranking and multi-task scoring practical within tight latency budgets, SilverTorch has consistently enabled retrieval quality improvements that would have been impractical under a microservices architecture. SilverTorch 使检索效率显著提升,支持大规模运行,并带来了更好的推荐效果。更高的吞吐量,更低的总拥有成本(TCO)。在 8000 万物品规模的端到端评估中,SilverTorch 每秒处理的请求数是基于相同模型架构的传统多服务基准系统的 23.7 倍,同时预估 TCO 效率提升了 20.9 倍。在大规模场景下得到验证。结果表明,SilverTorch 可以跨多个应用扩展,作为用户所见信息流和视频内容背后的主要检索系统。更好的推荐。通过在严格的延迟预算内实现神经重排序和多任务评分,SilverTorch 持续推动了检索质量的提升,而这在微服务架构下是难以实现的。
Moving From Microservice Mesh to One Integrated Neural Network
从微服务网格转向单一集成神经网络
The Microservice Paradigm We Replaced Traditional recommendation retrieval is built as a mesh of microservices. When a user opens a social media platform, the request hits an orchestrator, which fans out to a user-tower model service (which computes a vector representation of the user’s interests, called a “user embedding”), a combined retrieval service (which finds and filters candidate items based on similarity to the user vector and eligibility rules like language and geography), and a scoring service (which ranks the survivors). The orchestrator merges results and hands them downstream. Each service has its own codebase, often in a different programming language, with its own deployment lifecycle. This worked well in the CPU era. But as retrieval systems grew in scale and sophistication, three problems compounded into structural limits that no component-level optimization can fix: 我们所取代的微服务范式 传统的推荐检索系统构建为微服务网格。当用户打开社交媒体平台时,请求会到达编排器,随后分发至用户塔模型服务(计算用户兴趣的向量表示,即“用户嵌入”)、组合检索服务(根据与用户向量的相似度及语言、地理位置等合规规则查找并过滤候选物品)以及评分服务(对筛选出的内容进行排序)。编排器合并结果并将其传递给下游。每个服务都有自己的代码库,通常使用不同的编程语言,并拥有独立的部署生命周期。这在 CPU 时代运行良好。但随着检索系统规模和复杂性的增长,三个问题叠加形成了结构性限制,任何组件级的优化都无法解决:
Latency lost to data movement. Every hop between services costs network round-trip time and serialization overhead, eating into our sub-100-millisecond retrieval budget that should fund actual computation. And because filtering, search, and scoring are designed independently, they cannot be jointly optimized. 数据传输导致的延迟损耗。服务间的每一次跳转都会产生网络往返时间和序列化开销,蚕食了本应分配给实际计算的 100 毫秒内检索预算。此外,由于过滤、搜索和评分是独立设计的,它们无法进行联合优化。
Version inconsistency. The user-tower model, the item index, and the filtering rules each update on their own cadence. When the user model ships v2 but the item index is still on v1, the system queries v1 embeddings with v2 user representations — creating quality gaps no downstream ranking can recover. 版本不一致。用户塔模型、物品索引和过滤规则各自按不同的节奏更新。当用户模型发布 v2 版本而物品索引仍处于 v1 版本时,系统会用 v2 的用户表示去查询 v1 的嵌入——这会造成下游排序无法弥补的质量差距。
Siloed development environments. Machine learning (ML) engineers write PyTorch. Infrastructure engineers write C++. Different release cycles, different testing setups, different mental models. Every retrieval improvement requires translating an idea between two environments — weeks or months per cycle. Component-level optimizations like Faiss-GPU help by making the specific microservice faster, but they don’t resolve the underlying structural limits. The architecture is still a system of services with artifacts handed between them. 孤立的开发环境。机器学习(ML)工程师编写 PyTorch,基础设施工程师编写 C++。不同的发布周期、不同的测试设置、不同的思维模型。每一次检索改进都需要在两个环境之间转换想法——每个周期需要数周或数月。像 Faiss-GPU 这样的组件级优化虽然能加快特定微服务的速度,但无法解决根本的结构性限制。该架构本质上仍然是一个在各组件间传递产物的服务系统。
The Shift: All Components Are Model Modules SilverTorch rethinks the paradigm from the ground up. Instead of designing a microservices system and inserting neural networks into it, we start with the neural network and design outward. We call this Index as Model: Every retrieval component — the item index, eligibility filter, scoring layer and user tower — becomes a tensor or operator inside a single PyTorch model. That means one artifact to deploy, one forward pass to run and one source of truth for what’s in the system. 转变:所有组件皆为模型模块 SilverTorch 从根本上重新思考了这一范式。我们不再是先设计微服务系统再将神经网络嵌入其中,而是从神经网络出发向外设计。我们称之为“Index as Model”:每一个检索组件——物品索引、合规过滤器、评分层和用户塔——都成为了单一 PyTorch 模型内部的一个张量或算子。这意味着只需部署一个产物,运行一次前向传播,并拥有一个单一的系统事实来源。
Inside the Model A diagram of the SilverTorch Index as Model architecture. Inside this single neural network, different regions of the network handle different jobs. Approximate nearest neighbor (ANN) search regions find items most similar to the user… 模型内部 SilverTorch “Index as Model” 架构图。在这个单一的神经网络内部,网络的不同区域处理不同的任务。近似最近邻(ANN)搜索区域负责查找与用户最相似的物品……