$60B AI chip darling Cerebras almost died early on, burning $8M a month
$60B AI chip darling Cerebras almost died early on, burning $8M a month
600亿美元AI芯片宠儿Cerebras曾濒临倒闭,每月烧钱800万美元
Today, Cerebras Systems is a public company that sells AI chips for inference to giants like OpenAI and AWS. It held a blockbuster IPO on Thursday, with both of its co-founders billionaires, and ended the week worth about $60 billion. But in 2019, when it was three years old, it came dangerously close to failure – incinerating a shocking amount of money. 如今,Cerebras Systems 是一家上市公司,向 OpenAI 和 AWS 等巨头销售用于推理的 AI 芯片。该公司周四进行了轰动一时的首次公开募股(IPO),两位联合创始人都成为了亿万富翁,本周收盘时市值约为 600 亿美元。但在 2019 年,当公司成立仅三年时,它曾极其危险地接近失败——当时它正在烧掉惊人的资金。
It was trying to solve a technical problem no one in the semiconductor industry thought could be done. “We were spending about $8 million a month,” founder CEO Andrew Feldman told TechCrunch of that period. “At this point, we had incinerated nearly $200 million trying to solve one technical problem.” Every few weeks, Feldman was forced to make the painful walk of shame to the board meeting to report another failure and more money burned. But he had no choice. Without a solution, Cerebras was dead anyway. 当时,他们试图解决一个半导体行业内无人认为可以实现的技术难题。“我们当时每月大约花费 800 万美元,”创始人兼首席执行官 Andrew Feldman 在谈到那段时期时告诉 TechCrunch。“到那时,我们已经烧掉了近 2 亿美元,只为解决这一个技术问题。”每隔几周,Feldman 就不得不痛苦地前往董事会会议,报告又一次的失败和更多的资金消耗。但他别无选择,因为如果没有解决方案,Cerebras 无论如何都会倒闭。
It was founded with an idea that was simple on paper. The microprocessor industry had spent its entire 50+ years making CPUs faster and cheaper by cramming more transistors onto a silicon wafer and dicing wafers into ever tinier pieces. But AI required so much compute power, many chips had to be strung together and then forced to communicate with each other. Cerebras’ founders believed turning a whole, even bigger wafer into one giant, powerful chip, would work faster. 公司成立时的想法在纸面上很简单。微处理器行业在过去 50 多年里,一直通过将更多的晶体管塞进硅片并将晶圆切割成越来越小的碎片,来使 CPU 变得更快、更便宜。但 AI 需要巨大的计算能力,导致许多芯片必须串联在一起,并被迫相互通信。Cerebras 的创始人们认为,将整块更大的晶圆变成一个巨大且强大的芯片,运行速度会更快。
The problem was, no one had ever successfully done this before, for any reason, AI or not. Orchestrating that many microscopic electronic components onto a larger, but still thin, surface introduced compounding engineering problems. Once Cerebras crossed the first threshold of designing the mega chip and then manufacturing it with TSMC, the team hit the real roadblock. They couldn’t solve “packaging.” 问题在于,此前从未有人成功做到这一点,无论出于何种目的,也无论是否涉及 AI。将如此多微小的电子元件整合到一块更大但依然薄的表面上,引发了复杂的工程难题。一旦 Cerebras 跨越了设计巨型芯片并与台积电(TSMC)合作制造的初步门槛,团队就遇到了真正的障碍:他们无法解决“封装”问题。
This involves everything after manufacturing the silicon itself: adhering it to a motherboard, getting power to it, dealing with heating and cooling as well as the pipes that would deliver and return data, Feldman said. Cerebras’ chips “were 58 times larger. We were using 40 times as much power as anybody had ever used,” he said. There were no premade heat sinks. No vendors. No manufacturing partners. The brightest minds in microprocessor engineering had tried for decades to build such big, yet more dense chips, and failed. Feldman 表示,这涉及硅片制造之后的所有环节:将其粘合到主板上、供电、处理加热和冷却问题,以及负责数据传输的通道。Cerebras 的芯片“体积大了 58 倍。我们使用的功率是前人所用功率的 40 倍,”他说。当时没有现成的散热器,没有供应商,也没有制造合作伙伴。微处理器工程领域最聪明的大脑几十年来一直试图制造这种既大又密集的芯片,但都失败了。
The Cerebras team was left with trial and error in which “we destroyed an enormous number of chips” and an enormous amount of cash. But without functional packaging, the chip was useless. After exhaustive analysis of each failure, the team finally solved enough problems: how to cool it and move data around. In one instance, they had to invent their own machine that could bolt-in 40 screws simultaneously to secure the wafer to a board without cracking it. Cerebras 团队只能通过反复试验,在这个过程中“我们毁掉了大量的芯片”和巨额资金。但如果没有功能性的封装,芯片就毫无用处。在对每一次失败进行详尽分析后,团队终于解决了足够多的问题:如何冷却它以及如何传输数据。有一次,他们甚至不得不发明了一台机器,可以同时拧入 40 颗螺丝,将晶圆固定在电路板上而不使其破裂。
Feldman still remembers the day in July 2019 when it all, miraculously, worked. They installed the packaged chip into a computer, turned it on and the entire founding team (pictured below) “just stood in the lab and stared at it,” he said. “Watching a computer run is about as exciting as watching paint dry. But there we were watching lights flashing on the computer, stunned that we’d solved this.” “That was one of the greatest moments of my life,” he said. Feldman 至今仍记得 2019 年 7 月的那一天,一切奇迹般地成功了。他们将封装好的芯片安装到计算机中,开机后,整个创始团队(如下图所示)“只是站在实验室里盯着它看,”他说。“看电脑运行就像看油漆变干一样无聊。但当时我们看着电脑上闪烁的灯光,被我们解决了这个问题的事实惊呆了。”“那是我一生中最伟大的时刻之一,”他说。
That’s significant, because this same founding team had previously built and sold a pioneering cloud server startup, SeaMicro, to AMD for $334 million in 2012. 这一点意义重大,因为这支创始团队此前曾创立并于 2012 年以 3.34 亿美元的价格将一家开创性的云服务器初创公司 SeaMicro 出售给了 AMD。
The day the chip finally worked was also about two years after OpenAI had talked to Cerebras acquiring it, which Feldman confirmed to TechCrunch occurred like the publicly revealed emails said it did. Those talks fell through amidst growing squabbling among the OpenAI founders, several of whom are angel investors in Cerebras. 芯片最终成功运行的那一天,距离 OpenAI 与 Cerebras 商谈收购事宜大约过去了两年。Feldman 向 TechCrunch 证实,正如公开披露的电子邮件所言,确实发生过此事。由于 OpenAI 创始人之间日益激烈的争吵,这些谈判最终破裂,而其中几位创始人正是 Cerebras 的天使投资人。
Today OpenAI is a customer and a partner, having loaned Cerebras $1 billion secured by warrants. Those warrants conditionally grant OpenAI about 33 million shares of Cerebras’ stock, the S-1 discloses. (33 million shares are worth over $9 billion at Friday’s closing price of $279.) 如今,OpenAI 既是客户也是合作伙伴,它向 Cerebras 提供了 10 亿美元的贷款,并以认股权证作为担保。S-1 文件披露,这些认股权证有条件地授予 OpenAI 约 3300 万股 Cerebras 股票。(按周五 279 美元的收盘价计算,这 3300 万股股票价值超过 90 亿美元。)
Interestingly, Cerebras also agreed to not sell its wares to specific OpenAI competitors as part of that loan deal. Feldman wouldn’t confirm that the obvious company this involves: Anthropic. He did, however say that restriction is temporary. “It’s limited in time, and it was designed to make sure that we could get OpenAI the capacity,” he said. 有趣的是,作为贷款协议的一部分,Cerebras 还同意不向特定的 OpenAI 竞争对手销售其产品。Feldman 不愿证实这是否涉及显而易见的竞争对手:Anthropic。但他确实表示,这种限制是暂时的。“这是有时限的,旨在确保我们能够为 OpenAI 提供产能,”他说。
The truth was, Cerebras hasn’t yet grown big enough to handle multiple fast-growing model makers anyway. He likened selling AI compute capacity to an all-you-can eat buffet. Instead of trying to stuff itself on all potential customers, “We’re going to work with part of the buffet only, and we’re going to get comfortable with that, before we attack the rest,” he said. 事实上,Cerebras 目前还没有发展到足以同时应对多家快速增长的模型制造商的规模。他将销售 AI 计算能力比作自助餐。与其试图满足所有潜在客户,“我们打算先只处理自助餐的一部分,在适应之后,再去攻克其余部分,”他说。