Sanctioned Chinese AI Firm SenseTime Releases Image Model Built for Speed

Sanctioned Chinese AI Firm SenseTime Releases Image Model Built for Speed

受制裁的中国人工智能公司商汤科技发布主打速度的图像模型

SenseTime, a Chinese AI company best known for its facial recognition technology, released a new open source model on Tuesday that it claims can both generate and interpret images far faster than top models developed by US competitors. SenseNova U1 could help the company reclaim lost ground after it slipped from its place among the leading players in China’s AI development race.

商汤科技(SenseTime)是一家以人脸识别技术闻名的中国人工智能公司。周二,该公司发布了一款新的开源模型,并声称其生成和解读图像的速度远超美国竞争对手开发的顶级模型。在商汤科技从中国人工智能发展竞赛的领先者行列中掉队后,SenseNova U1 有望帮助该公司收复失地。

The model’s secret sauce is its ability to “read” images without translating them to text first, speeding up the process and reducing the amount of computing power required. “The model’s entire reasoning process is no longer limited to text. It can reason with images as well,” Dahua Lin, cofounder and chief scientist at SenseTime, said in an interview with WIRED.

该模型的“秘诀”在于它无需先将图像转换为文本即可直接“阅读”图像,从而加快了处理速度并降低了所需的计算能力。商汤科技联合创始人兼首席科学家林达华在接受《连线》(WIRED)杂志采访时表示:“该模型的整个推理过程不再局限于文本,它也可以通过图像进行推理。”

Lin, who is also a professor of information engineering at the Chinese University of Hong Kong, says that models capable of processing images directly will enable robots to better understand the physical world in the future.

林达华同时也是香港中文大学信息工程学教授,他认为,能够直接处理图像的模型将使机器人未来能够更好地理解物理世界。

Like DeepSeek’s latest flagship model, SenseTime says U1 can be powered by Chinese-made chips. “Several Chinese domestic chipmakers have finished optimizing compatibility with our new model,” Lin says. On release day, 10 Chinese chip designers, including Cambricon and Biren Technology, announced their hardware supports U1.

与深度求索(DeepSeek)最新的旗舰模型一样,商汤科技表示 U1 可以由国产芯片驱动。林达华说:“几家中国本土芯片制造商已经完成了与我们新模型的兼容性优化。”在发布当天,包括寒武纪(Cambricon)和壁仞科技(Biren Technology)在内的 10 家中国芯片设计公司宣布其硬件支持 U1。

That flexibility matters because US export controls restrict Chinese firms from accessing the world’s most advanced AI chips, particularly those used for training, which at this point are primarily developed by Western companies like Nvidia. “We will continue to push for training on more different chips,” Lin says. But he also acknowledges that SenseTime “may still need to use the best chips to ensure the speed of our iteration.”

这种灵活性至关重要,因为美国的出口管制限制了中国企业获取全球最先进的人工智能芯片,特别是那些用于训练的芯片,而这些芯片目前主要由英伟达(Nvidia)等西方公司开发。林达华表示:“我们将继续推动在更多不同芯片上的训练。”但他同时也承认,商汤科技“可能仍然需要使用最好的芯片来确保我们的迭代速度。”

SenseTime released U1 for free on Hugging Face and GitHub, another sign of how Chinese companies are becoming some of the most active contributors to open source AI.

商汤科技在 Hugging Face 和 GitHub 上免费发布了 U1,这再次表明中国公司正成为开源人工智能领域最活跃的贡献者之一。

SenseTime was founded in 2014 and became a world leader in computer vision, which is used in applications like facial recognition and autonomous driving. But when ChatGPT and other AI systems powered by natural language processing became the hottest thing in the tech industry, SenseTime began struggling to turn a profit and fell behind newer Chinese startups like DeepSeek and MiniMax.

商汤科技成立于 2014 年,曾是计算机视觉领域的世界领导者,该技术被广泛应用于人脸识别和自动驾驶等领域。然而,当 ChatGPT 及其他由自然语言处理驱动的人工智能系统成为科技行业的热点时,商汤科技开始在盈利方面陷入困境,并落后于深度求索(DeepSeek)和 MiniMax 等新兴中国初创公司。

SenseTime says it hopes that releasing SenseNova-U1 publicly for anyone to use will help it catch up with both domestic and Western AI players. Lin says the company finally made the decision last year to focus on open source because of the helpful feedback it gets from researchers, which enables the company to iterate faster. “In this day and age, being open source or closed source is not the winning factor; the speed of iteration is,” Lin explains.

商汤科技表示,希望通过公开 SenseNova-U1 供任何人使用,能帮助其追赶国内和西方的 AI 竞争对手。林达华表示,公司去年最终决定专注于开源,是因为从研究人员那里获得了有益的反馈,这使公司能够更快地进行迭代。林达华解释道:“在当今时代,开源还是闭源并不是制胜因素,迭代速度才是。”

Going open source also helps SenseTime continue collaborating with international researchers without the interference of geopolitics. The company has been sanctioned repeatedly by the US government in recent years over allegations that its facial recognition technology helped power surveillance systems used to monitor and detain Uyghurs and other minority groups in China’s Xinjiang region. As a result, US firms are restricted from investing in SenseTime and selling certain technologies to it without a license. (SenseTime has denied the allegations.)

转向开源也有助于商汤科技在不受地缘政治干扰的情况下,继续与国际研究人员合作。近年来,该公司因被指控其人脸识别技术为中国新疆地区用于监控和拘留维吾尔族及其他少数民族群体的监控系统提供支持,而多次受到美国政府的制裁。因此,美国公司在没有许可证的情况下,被限制投资商汤科技或向其出售特定技术。(商汤科技已否认了这些指控。)

Seeing Clearly / 清晰洞察

In an accompanying technical report, SenseTime claims that SenseNova-U1 generates higher-quality images than all other open source models currently on the market. Its performance is comparable to leading Chinese closed source models like Alibaba’s Qwen and ByteDance’s Seedream, but it still lags behind industry leaders like GPT-Image-2.0, which came out just a week ago.

在一份随附的技术报告中,商汤科技声称 SenseNova-U1 生成的图像质量高于目前市场上所有其他开源模型。其性能可与阿里巴巴的通义千问(Qwen)和字节跳动的 Seedream 等领先的中国闭源模型相媲美,但仍落后于一周前刚刚发布的 GPT-Image-2.0 等行业领先者。

But the model’s main selling point is its ability to generate images much faster than all of those models. It relies on an innovative technical structure called NEO-Unify that SenseTime previewed earlier this year.

但该模型的主要卖点在于其生成图像的速度远快于上述所有模型。它依赖于一种名为 NEO-Unify 的创新技术结构,商汤科技曾在今年早些时候预告过该结构。

The model’s new architecture, which could improve efficiency and performance, is what sets U1 apart, says Adina Yakefu, an AI researcher at Hugging Face. “This is a more ambitious approach, as it still faces significant practical challenges,” she says. “It’s good that they decided to open source it, so the community can explore and test it more widely.” The model is also small enough to run on PCs and phones, making it potentially useful in many scenarios.

Hugging Face 的人工智能研究员 Adina Yakefu 表示,该模型的新架构能够提高效率和性能,这正是 U1 的独特之处。“这是一种更具雄心的方法,因为它仍然面临重大的实际挑战,”她说,“他们决定将其开源是件好事,这样社区可以更广泛地探索和测试它。”该模型体积足够小,可以在个人电脑和手机上运行,使其在许多场景中具有潜在用途。

Lin says the technique SenseTime developed will be especially useful in robotics. When a robot tries to process the visual world, it needs to sort through an enormous amount of information. “It has to think, ‘How should I deal with all the clutter in this room? If there is a complicated machine in front of me, which button should I press?’ All of these are forms of information, and they need to be integrated into the model’s internal judgment,” he says. Because it can understand images natively, Lin is hopeful that SenseTime’s technology will help robots act faster and make fewer mistakes in complex environments.

林达华表示,商汤科技开发的技术在机器人领域将特别有用。当机器人试图处理视觉世界时,它需要梳理海量信息。“它必须思考:‘我该如何处理这个房间里所有的杂物?如果我面前有一台复杂的机器,我该按哪个按钮?’所有这些都是信息形式,需要整合到模型的内部判断中,”他说。由于该模型能够原生理解图像,林达华希望商汤科技的技术能帮助机器人在复杂环境中行动更快、犯错更少。

China is in the midst of a humanoid robot boom. While SenseTime doesn’t currently develop its own robots, Lin says it is closely working with ACE Robotics, a startup led by another SenseTime cofounder. It’s also developing models that specialize in geospatial understanding, or creating simulations of the real world.

中国正处于人形机器人热潮之中。虽然商汤科技目前并不开发自己的机器人,但林达华表示,公司正在与由另一位商汤联合创始人领导的初创公司 ACE Robotics 密切合作。此外,商汤科技还在开发专门用于地理空间理解或创建现实世界模拟的模型。