Two AI-based science assistants succeed with drug-retargeting tasks

两款基于人工智能的科学助手在药物重定向任务中取得成功

On Tuesday, Nature released two papers describing AI systems intended to help scientists develop and test hypotheses. One, Google’s Co-Scientist, is designed as what they term “scientist in the loop,” meaning researchers are regularly applying their judgments to direct the system. The second, from a nonprofit called FutureHouse, goes a step beyond and has trained a system that can evaluate biological data coming from some specific classes of experiments.

周二，《自然》杂志发表了两篇论文，介绍了旨在帮助科学家开发和测试假设的人工智能系统。其中之一是谷歌的“Co-Scientist”，它被设计为所谓的“人在回路”（scientist in the loop）模式，意味着研究人员需要定期运用自己的判断力来指导系统。第二款系统来自一家名为 FutureHouse 的非营利组织，它更进一步，训练出了一种能够评估特定实验类别中生物数据的系统。

While Google says its system will also work for physics, both groups exclusively present biological data, and largely straightforward hypotheses—this drug will work for that. So, this is not an attempt to replace either scientists or the scientific process. Instead, it’s meant to help with what current AIs are best at: chewing through massive amounts of information that humans would struggle to come to grips with.

虽然谷歌表示其系统也适用于物理学，但两个团队目前展示的均是生物学数据，且假设大多比较直接——即某种药物对某种疾病有效。因此，这并非试图取代科学家或科学研究过程，而是旨在发挥当前人工智能最擅长的领域：处理人类难以应对的海量信息。

What’s this good for? There are some distinctions between the two systems, but both are what is termed agentic; they operate in the background by calling out to separate tools. (Microsoft has taken a similar approach with its science assistant as well; OpenAI seems to be an exception in that it simply tuned an LLM for biology.) And, while there are differences between them that we’ll highlight, they are both focused on the same general issue: the utter profusion of scientific information.

这有什么用呢？尽管这两个系统之间存在一些差异，但它们都属于所谓的“代理式”（agentic）系统；它们通过调用独立的工具在后台运行。（微软在其科学助手中也采取了类似的方法；OpenAI 则似乎是个例外，它只是针对生物学对大语言模型进行了微调。）虽然我们稍后会强调它们之间的区别，但它们都专注于同一个普遍问题：科学信息的极度过剩。

With the ease of online publishing, the number of journals has exploded, and with them the number of papers. It has gotten tough for any researcher to stay on top of their field. Finding potentially relevant material in other fields is a real challenge. If you’re focused on eye development, for example, one of the signaling systems used may also be involved in the kidney, and it can be easy to miss what people are discovering about it.

随着在线出版变得日益便捷，期刊数量呈爆炸式增长，论文数量也随之激增。对于任何研究人员来说，想要掌握本领域的前沿动态都变得非常困难。在其他领域寻找潜在的相关资料更是一项真正的挑战。例如，如果你专注于眼睛发育研究，其中涉及的某种信号系统可能也与肾脏有关，而你很容易忽略其他人对此的相关发现。

As the people at FutureHouse put this issue, “By focusing on ‘combinatorial synthesis’ (identifying non-obvious connections between disparate fields), Robin effectively targets ‘low-hanging fruit’ that human experts may overlook due to the compartmentalization of scientific knowledge.” This is a task that’s well-suited to AI, which can chew through the peer-reviewed literature in the background while researchers do other things.

正如 FutureHouse 的团队所言：“通过专注于‘组合合成’（识别不同领域之间不明显的联系），Robin 有效地锁定了那些因科学知识碎片化而被人类专家忽视的‘低垂果实’。”这项任务非常适合人工智能，它可以在研究人员处理其他事务时，在后台消化海量的同行评审文献。

This isn’t really a question of whether an AI could do something better or worse than a human; it’s more of an issue of whether any human would end up doing these sorts of searches at all. By finding enough connections among disparate research, these tools can make suggestions—hypotheses, really—about the biology. This can include things like what processes underlie biological behaviors and what pathways and networks regulate those processes.

这其实并不是关于人工智能是否比人类做得更好或更差的问题，更多的是关于是否会有研究人员愿意去进行这类搜索的问题。通过在不同研究之间建立足够的联系，这些工具可以提出关于生物学的建议——实际上就是假设。这可能包括生物行为背后的过程，以及哪些通路和网络调节这些过程。

And, in the cases explored here, it included suggesting known drugs that might target some of these pathways in diseased cells: acute myeloid leukemia in Google’s case, and a form of macular degeneration for FutureHouse.

在本文探讨的案例中，这些建议还包括针对病变细胞中某些通路的已知药物：谷歌的案例涉及急性髓系白血病，而 FutureHouse 的案例则涉及一种黄斑变性。

Co-Scientist

Co-Scientist 系统

As you might imagine, Google’s system is based on the company’s Gemini large language model. That helps the system interpret a statement of research goals provided by human scientists and starts a literature search to find relevant information and form hypotheses. Those are then evaluated relative to each other in a “tournament,” the results of which are evaluated by a Reflection agent. An Evolution agent can then make improvements to any surviving ideas, which can be sent back through the process.

正如你所料，谷歌的系统基于其 Gemini 大语言模型。这有助于系统解读人类科学家提供的研究目标陈述，并启动文献搜索以寻找相关信息并形成假设。随后，这些假设会在一场“锦标赛”中进行相互评估，评估结果由一个“反思代理”（Reflection agent）进行审视。接着，一个“进化代理”（Evolution agent）可以对幸存的构思进行改进，并将其重新送回流程中进行迭代。

Key criteria considered throughout this process include plausibility, novelty, testability, and safety. And the Reflection tool has access to external search tools, as access to the scientific literature “prevented the hallucination of seemingly novel but implausible hypotheses,” the company wrote. As the paper puts it, scientists were kept in the loop at all times.

在此过程中考虑的关键标准包括合理性、新颖性、可测试性和安全性。此外，“反思”工具可以访问外部搜索工具，因为谷歌写道，获取科学文献“防止了看似新颖但不合理的假设产生幻觉”。正如论文所述，科学家始终处于“人在回路”的监控中。

In the search for potential drugs targeting leukemia, the suggestions made by the system were prioritized based on a review by a panel of experts, who had access to the literature Co-Scientist used to formulate its suggestions. The results are what you would expect from cancer therapies. Some of the drugs identified were effective, but only against subsets of a panel of myeloid leukemia cells. That’s not unusual, given that there are multiple routes to unchecked growth, so drugs that block the route followed by one cell type may not be effective in cells that took a different route.

在寻找针对白血病的潜在药物时，系统提出的建议由专家组进行审查并排序，专家们可以查阅 Co-Scientist 用来制定建议的文献。结果正如人们对癌症疗法所预期的那样：所识别出的部分药物确实有效，但仅对部分髓系白血病细胞有效。这并不罕见，因为细胞失控性生长有多种途径，因此阻断某种细胞类型生长途径的药物，对于采取不同途径的细胞可能无效。

Google also mentioned that the system could do more general hypothesizing that doesn’t involve drugs, using an example of the spread of virulence genes in bacteria. But the details of that work were fairly sparse. The system is also set up so that it’s model agnostic, allowing it to be switched over to better-performing models as AI systems evolve. But they also warn that, “Co-Scientist also inherits the intrinsic limitations of its underlying models, including imperfect factuality and the potential for hallucinations.”

谷歌还提到，该系统可以进行不涉及药物的更通用的假设推演，并以细菌中毒力基因的传播为例。但关于这项工作的细节相当匮乏。该系统还被设计为模型无关（model agnostic），允许随着人工智能系统的演进切换到性能更好的模型。但他们也警告称：“Co-Scientist 也继承了其底层模型的固有局限性，包括事实准确性不完美以及产生幻觉的可能性。”

And Robin

以及 Robin 系统

FutureHouse’s system has some similarities but a couple of critical differences that go beyond naming all the agentic tools after birds. The main system, Robin, has access to specialized literature search tools. One, Crow, produces a concise summary of papers, while Falcon gives a deep overview of the information contained in the paper. The paper describing the system provides a clear sense of the advantages here: “Robin analyses 551 papers in 30 minutes compared to an estimated time of 540 hours for a human.”

FutureHouse 的系统与前者有一些相似之处，但也有几个关键区别，不仅仅是所有代理工具都以鸟类命名。其核心系统 Robin 可以访问专业的文献搜索工具。其中，“Crow”负责生成论文的简明摘要，而“Falcon”则提供论文所含信息的深度概览。描述该系统的论文清晰地展示了其优势：“Robin 在 30 分钟内分析了 551 篇论文，而人类完成同样工作估计需要 540 小时。”

Taking those summaries, Robin then formed a series of hypotheses about disease mechanisms for macular degeneration and used these tools to provide a detailed report on the evidence for each mechanism. An LLM judge then made pairwise comparisons among the hypotheses, which resulted in relative rankings—a bit like Google’s tournament system. In a similar manner, the system was redeployed to suggest cell lines and cult…

利用这些摘要，Robin 随后针对黄斑变性的疾病机制形成了一系列假设，并利用这些工具为每种机制的证据提供了详细报告。随后，一个大语言模型裁判对这些假设进行了两两比较，得出了相对排名——这有点像谷歌的“锦标赛”系统。以类似的方式，该系统被重新部署以建议细胞系和培养……