Researchers try to cut the genetic code from 20 to 19 amino acids

Researchers try to cut the genetic code from 20 to 19 amino acids

研究人员尝试将遗传密码从 20 种氨基酸减少到 19 种

The genetic code is central to life. With minor variations, everything uses the same sets of three DNA bases to encode the same 20 amino acids. We have discovered no major exceptions to this, leading researchers to conclude that this code probably dated back to the last common ancestor of all life on Earth. 遗传密码是生命的核心。除了细微的差异外,所有生物都使用相同的三个 DNA 碱基组合来编码相同的 20 种氨基酸。我们尚未发现任何重大的例外情况,这使研究人员得出结论:这种密码很可能追溯到地球上所有生命的共同祖先。

But there has been a lot of informed speculation about how that genetic code initially evolved. Most hypotheses suggest that earlier forms of life had partial genetic codes and used fewer than 20 amino acids. To test these hypotheses, a team from Columbia and Harvard decided to see if they could get rid of one of the 20 currently in use. And, as a first attempt, they engineered a portion of the ribosome that worked without using an otherwise essential amino acid: isoleucine. 但关于遗传密码最初是如何演化的,学术界一直存在许多基于科学依据的推测。大多数假说认为,早期的生命形式拥有不完整的遗传密码,使用的氨基酸少于 20 种。为了验证这些假说,来自哥伦比亚大学和哈佛大学的一个团队决定尝试剔除目前使用的 20 种氨基酸中的一种。作为首次尝试,他们改造了核糖体的一部分,使其在不使用异亮氨酸(一种原本必需的氨基酸)的情况下仍能正常工作。

Changing the code

改变密码

First off, why would you do this? Most work in the field has focused on altering the genetic code in ways that are useful, such as using more than 20 amino acids to enable interesting chemistry. The reasoning here seems to be that, prior to the last common ancestor of life on Earth, organisms experimented with various genetic codes and probably used a mix of proteins and catalytic RNAs to run their metabolisms. 首先,为什么要这样做?该领域的大多数研究都集中在以有用的方式改变遗传密码上,例如使用超过 20 种氨基酸来实现有趣的化学反应。此项研究的逻辑似乎是:在地球上所有生命的共同祖先出现之前,生物体曾尝试过各种遗传密码,并可能利用蛋白质和催化 RNA 的混合体来维持其代谢。

While we’ve done a lot of studies on catalytic RNAs, we have far less of an idea of what sort of chemistry is possible with a reduced genetic code. And the researchers suggest that AI-based tools have matured enough that redesigning proteins to use fewer amino acids is far more realistic than it was just a few years ago. 虽然我们已经对催化 RNA 进行了大量研究,但对于简化后的遗传密码能实现何种化学反应,我们知之甚少。研究人员认为,基于人工智能的工具已经足够成熟,使得重新设计蛋白质以使用更少的氨基酸比几年前要现实得多。

Isoleucine is one of three highly similar amino acids, along with leucine and valine. In the portion of the structure that’s distinct from other amino acids, all three have a branched structure that’s composed entirely of carbon and hydrogen. That makes them all hydrophobic, and they often are located in the interior of proteins, which keeps them away from the watery environment of the cell. So, purely by reasoning it out, one of those three would seem to be a good candidate to get rid of. 异亮氨酸与亮氨酸和缬氨酸是三种高度相似的氨基酸。在它们区别于其他氨基酸的结构部分,这三种氨基酸都具有完全由碳和氢组成的支链结构。这使得它们都具有疏水性,通常位于蛋白质内部,从而避开细胞内的水环境。因此,仅从逻辑推理来看,这三种氨基酸中的任何一种似乎都是剔除的理想候选者。

The researchers involved backed that reasoning up with evidence. They ran an analysis of the E. coli genome, checking which amino acids were substituted by other ones in related proteins from other species. Isoleucine was the amino acid that was most frequently swapped out for a different one. So, the researchers decided to start answering the question of whether we really need it at all. 参与研究的人员用证据支持了这一推理。他们对大肠杆菌(E. coli)的基因组进行了分析,检查了其他物种的相关蛋白质中哪些氨基酸被其他氨基酸所取代。结果发现,异亮氨酸是替换频率最高的氨基酸。因此,研究人员决定开始探究我们是否真的需要它。

Editing all 4,500 or so genes in E. coli would be a monumental task, and that many changes at once would almost certainly end up killing it, so the researchers started out with much smaller tests. To begin with, they took a set of 36 essential genes and replaced every isoleucine in them with valine, a similar amino acid, and then put the introduced gene back into the genome. For 22 of the genes, doing so killed the cells. But that does indicate that 17 of them got by ok without isoleucine, including one where it was swapped out in 45 different positions along the amino acid chain. Notably, even in cases where cells tolerated the change, their growth often slowed compared to the unedited cells. That will become a recurring theme. 编辑大肠杆菌中约 4,500 个基因是一项艰巨的任务,一次性进行如此多的改变几乎肯定会导致细胞死亡,因此研究人员从更小的测试开始。首先,他们选取了 36 个必需基因,将其中所有的异亮氨酸替换为相似的氨基酸——缬氨酸,然后将这些基因放回基因组中。对于其中 22 个基因,这种做法导致了细胞死亡。但这确实表明,有 17 个基因在没有异亮氨酸的情况下仍能正常工作,其中一个基因在氨基酸链的 45 个不同位置进行了替换。值得注意的是,即使在细胞能够耐受这种改变的情况下,它们的生长速度通常也比未编辑的细胞慢。这将成为一个反复出现的主题。

Redesigning the ribosome

重新设计核糖体

To give their project a focus, the researchers decided to start engineering an isoleucine-free ribosome. The ribosome is a large complex of proteins and RNAs that translates messenger RNAs into proteins—you can think of it as a bit like one of the hardware components that’s needed to boot a living cell from a genome. Obviously, many of the proteins in the ribosome have critical enzymatic activities. But bringing that complex together requires that these proteins interact with each other and RNAs. So, the ribosome provides a very stringent test of whether engineering out an amino acid can be tolerated by cells. 为了使项目更有针对性,研究人员决定开始设计一种不含异亮氨酸的核糖体。核糖体是一个由蛋白质和 RNA 组成的巨大复合体,负责将信使 RNA 翻译成蛋白质——你可以把它想象成从基因组启动活细胞所需的硬件组件之一。显然,核糖体中的许多蛋白质具有关键的酶活性。但要组装这个复合体,需要这些蛋白质与彼此以及 RNA 相互作用。因此,核糖体为“剔除某种氨基酸是否能被细胞耐受”提供了一个非常严格的测试。

As a preliminary test, the team did an isoleucine-to-valine swap for 50 different individual genes that contribute proteins to the ribosome. Eighteen of those worked with no obvious problems, another 19 grew more slowly, and the changes were lethal for the remaining 13 genes. The team then focused on the 32 genes with reduced fitness and adapted deep-learning protein-design software to suggest alternative sequences that did not include isoleucine. 作为初步测试,研究团队对 50 个贡献核糖体蛋白质的独立基因进行了异亮氨酸到缬氨酸的替换。其中 18 个基因工作正常,没有明显问题,另外 19 个基因的生长速度变慢,而其余 13 个基因的改变则是致命的。随后,团队重点研究了这 32 个适应性降低的基因,并利用深度学习蛋白质设计软件来建议不含异亮氨酸的替代序列。

Iterative testing using four different software packages produced alternative protein sequences for 25 of these 32 proteins that eliminated the fitness issues. For the remaining five, they went back and forced changes at the isoleucine. They then let the software design changes in the amino acids that are physically close to it within the three-dimensional structure of the protein, the idea being that the change in amino acid may disrupt the protein’s structure in a way that other changes in nearby amino acids could compensate for. This led to successful redesigns for four of the five problem proteins. 通过使用四种不同的软件包进行迭代测试,他们为这 32 种蛋白质中的 25 种生成了替代蛋白质序列,从而消除了适应性问题。对于剩下的 5 种,他们重新进行了强制性的异亮氨酸替换。然后,他们让软件设计蛋白质三维结构中物理位置靠近该位点的氨基酸变化,其思路是:氨基酸的改变可能会破坏蛋白质结构,而附近氨基酸的其他变化可以补偿这种破坏。这使得 5 种问题蛋白质中的 4 种被成功重新设计。

While these are impressive achievements, testing them individually doesn’t really give the full picture of whether these redesigned proteins can put together a functionally equivalent ribosome. To do that, the researchers decided to remove isoleucine from all of the proteins in the small subunit of the ribosome. This is largely a matter of convenience. The genes for the 21 proteins in the small subunit are all clustered next to each other on a 10,000-base-long stretch of the genome, so the researchers could just replace them all at once. 虽然这些成就令人印象深刻,但单独测试并不能全面反映这些重新设计的蛋白质是否能组装成功能等效的核糖体。为了做到这一点,研究人员决定从核糖体小亚基的所有蛋白质中剔除异亮氨酸。这主要是出于便利考虑。小亚基中 21 种蛋白质的基因都聚集在基因组上一段 10,000 个碱基长的区域内,因此研究人员可以一次性将它们全部替换。

Thinking small

从小处着手

Using the redesigned proteins from the earlier work, they started replacing ever-larger stretches of the genes along this 10,000-base stretch of DNA. Starting from one side, they replaced 10 genes without any trouble. By the time they got to replacing 17 of the 21, the cells were growing more slowly. Replacing 18 genes at once, however, killed the cells entirely. So, they started working in from the other direction and… 利用早期工作中重新设计的蛋白质,他们开始沿着这段 10,000 个碱基的 DNA 区域,替换越来越长的基因片段。从一端开始,他们顺利替换了 10 个基因。当替换到 21 个基因中的 17 个时,细胞的生长速度开始变慢。然而,一次性替换 18 个基因导致细胞完全死亡。于是,他们开始从另一个方向着手,并且……