Grounding LLMs with Fresh Web Data to Reduce Hallucinations
Grounding LLMs with Fresh Web Data to Reduce Hallucinations
利用实时网络数据为大模型(LLM)提供基础,减少“幻觉”
Why production LLM systems need live web search to overcome knowledge cutoffs and stale training data 为什么生产环境中的大模型系统需要实时网络搜索来克服知识截止日期和训练数据滞后的问题
There’s a growing assumption that if you connect a large language model (LLM) to your production system or application, it will simply “know” how to answer your questions. Unfortunately, that isn’t how it works. As impressive as LLMs may be, they need access to data just like any other model. Most LLMs have an inherent knowledge cutoff, the point in time where their training data ends. When users ask questions about information after that date, the model may still produce answers–just not correct ones. We call these poor answers LLM hallucinations, but they’re really an expected outcome of an information mismatch. 人们越来越倾向于认为,只要将大语言模型(LLM)连接到生产系统或应用程序中,它就能“知道”如何回答问题。遗憾的是,事实并非如此。尽管大模型表现惊人,但它们和其他模型一样,需要获取数据。大多数大模型都有固有的知识截止日期,即其训练数据结束的时间点。当用户询问该日期之后的信息时,模型可能仍会给出答案,但这些答案并不准确。我们将这些糟糕的回答称为“大模型幻觉”,但实际上,这只是信息不匹配的预期结果。
LLMs train on static snapshots of the internet, but customers interacting with support bots, managers leveraging internal AI assistants, and sales teams depending on product copilots expect real-time knowledge and up-to-date data. Your LLM doesn’t natively know about breaking news, policy updates, shifting competitor pricing, or changes to API documentation. You need to ground it with fresh external data to make sure its answers (delivered with unwavering confidence) are actually right. 大模型是在互联网的静态快照上进行训练的,但与支持机器人交互的客户、利用内部人工智能助手的经理以及依赖产品副驾驶(Copilot)的销售团队,都期望获得实时知识和最新数据。你的大模型无法原生获知突发新闻、政策更新、竞争对手价格变动或 API 文档的变更。你需要用新鲜的外部数据为其提供基础(Grounding),以确保它(以坚定自信的语气)给出的答案确实是正确的。
What is LLM Grounding?
什么是大模型基础(LLM Grounding)?
LLM grounding means adding external, up-to-date information at the time of generation. Ungrounded out-of-the-box LLMs primarily rely on their training data and the user prompt. That works for many scenarios, but not when the question requires fresh information such as the latest tax regulations or financial reporting requirements. Grounded production LLM systems have access to current knowledge sources. They hallucinate less and produce more reliable outputs. Think of it as having a reasoning engine with no internet access (an ungrounded LLM) versus one that can search for real-time information (a grounded LLM). 大模型基础是指在生成内容时添加外部的、最新的信息。未经基础处理的开箱即用型大模型主要依赖其训练数据和用户提示词。这在许多场景下有效,但当问题需要最新信息(如最新的税收法规或财务报告要求)时,这种方式就行不通了。经过基础处理的生产级大模型系统可以访问当前的知识源。它们产生的幻觉更少,输出也更可靠。你可以将其想象为:一个没有互联网连接的推理引擎(未处理的基础模型)与一个可以搜索实时信息的引擎(经过基础处理的模型)之间的区别。
To achieve this, a grounded LLMs may use external dynamic data sources, retrieval systems, or even live web data. The most common way to implement this today is through retrieval augmented generation (RAG), but as you’ll soon see, even RAG has its limitations. 为了实现这一点,经过基础处理的大模型可以使用外部动态数据源、检索系统,甚至是实时网络数据。目前实现这一目标最常见的方法是检索增强生成(RAG),但正如你很快会看到的,即使是 RAG 也有其局限性。
Why RAG Falls Short in Production
为什么 RAG 在生产环境中存在不足?
Retrieval augmented generation, or RAG, typically works by selecting relevant context from pre-computed vector stores (often implemented as vector databases) and supplying it to the LLM at query time. This improves the LLM’s response by grounding it with external knowledge sources such as a company’s internal documents or product specifications. While highly effective for stable knowledge bases, RAG systems are only as fresh as the data they retrieve. You’ll need to consistently update your vector stores to make sure RAG has access to up-to-date data. Any lag in ingestion leads once again to hallucinations in the form of outdated answers. 检索增强生成(RAG)通常的工作方式是:从预先计算的向量存储(通常实现为向量数据库)中选择相关上下文,并在查询时提供给大模型。这通过公司内部文档或产品规格等外部知识源为大模型提供基础,从而改善其响应。虽然 RAG 对于稳定的知识库非常有效,但其时效性完全取决于所检索数据的时效性。你需要不断更新向量存储,以确保 RAG 能够获取最新数据。任何摄入上的滞后都会再次导致以“过时答案”形式出现的幻觉。
Live web data changes the game entirely. With RAG vector stores, your LLM gets a snapshot of time; with live web information, your LLM receives a continuously updated view of reality. Real-time data from the web helps solve the issue of freshness, but it also provides your LLM with additional coverage for long-tail or unindexed information. RAG may not have a vector for the exact phrasing you need, but if you give your LLM access to real-time search results, it can provide an accurate response. 实时网络数据彻底改变了这一局面。使用 RAG 向量存储,你的大模型获得的是时间快照;而使用实时网络信息,你的大模型获得的是对现实持续更新的视角。来自网络的实时数据不仅有助于解决时效性问题,还为你的大模型提供了针对长尾或未索引信息的额外覆盖。RAG 可能没有你所需确切措辞的向量,但如果你让大模型能够访问实时搜索结果,它就能提供准确的回答。
What Managed Search Infrastructure for LLMs Looks Like
大模型托管搜索基础设施是什么样的?
Managed search infrastructure provides a way to fetch live search results without the hassle of building your own scrapers. These services abstract away search data retrieval, allowing you to focus on your production LLM systems. In practice, they make it much easier to ground your LLM with real-time data from the web, whether on its own or alongside a RAG system. 托管搜索基础设施提供了一种获取实时搜索结果的方法,无需你自己构建爬虫。这些服务将搜索数据检索过程抽象化,让你能够专注于生产级大模型系统。在实践中,无论是在独立使用还是与 RAG 系统配合使用时,它们都能让你更轻松地利用实时网络数据为大模型提供基础。
Most managed search tools fall into one of several categories: traditional search APIs, search engine results page (SERP) APIs, LLM-native search platforms, and built-in LLM web search tools. Traditional search APIs offer a straightforward way to obtain a curated subset of search results. SERP APIs provide more complete, structured access to SERPs. For example, SerpApi is a web search API developers can use to easily combine live search results from over a hundred APIs with any application. Newer LLM-native tools like Tavily and Exa focus on simplifying LLM integration by returning re-ranked or summarized results. Search tools contained within LLMs allow for seamless integration but typically give you condensed results with limited control over data sources. 大多数托管搜索工具可分为几类:传统搜索 API、搜索引擎结果页面(SERP)API、大模型原生搜索平台以及内置的大模型网络搜索工具。传统搜索 API 提供了一种获取精选搜索结果子集的直接方式。SERP API 则提供了对搜索结果页面更完整、结构化的访问。例如,SerpApi 是一种网络搜索 API,开发者可以使用它轻松地将来自一百多个 API 的实时搜索结果与任何应用程序结合起来。像 Tavily 和 Exa 这样较新的大模型原生工具,则专注于通过返回重排序或总结后的结果来简化大模型集成。大模型内置的搜索工具允许无缝集成,但通常只提供压缩后的结果,且对数据源的控制有限。
Patterns for Integrating Live Web Search into LLM Pipelines
将实时网络搜索集成到大模型流水线中的模式
When adding live search data to your LLM pipeline, you’ll want to consider how much control you give the LLM, how much latency you can tolerate, and how much complexity you’re comfortable managing. There are three main architecture patterns for incorporating live external data into production LLM systems, each with different tradeoffs across those dimensions. 在将实时搜索数据添加到大模型流水线时,你需要考虑给予大模型多少控制权、你能容忍多少延迟,以及你愿意管理多大的复杂性。将实时外部数据纳入生产级大模型系统主要有三种架构模式,每种模式在上述维度上都有不同的权衡。
Search-First Pipelines 搜索优先流水线
Search-first pipelines do exactly what they sound like: they search first. When a user submits a query, the system immediately calls a search API and injects the… 搜索优先流水线正如其名:它们先进行搜索。当用户提交查询时,系统会立即调用搜索 API 并注入……