Antigravity 2.0 Tops the OpenSCAD Architectural 3D LLM Benchmark

Antigravity 2.0 Tops the OpenSCAD Architectural 3D LLM Benchmark

Antigravity 2.0 在 OpenSCAD 建筑 3D 大模型基准测试中夺魁

We ran a small practical benchmark: give several AI coding tools the same kind of task and ask them to build the Pantheon in OpenSCAD. ModelRift generates OpenSCAD for every 3D model on the platform. The LLM’s ability to handle spatial geometry directly affects what we can ship, so we track how models improve on this kind of task. 我们进行了一项小型实用基准测试:为多个 AI 编程工具布置相同的任务,要求它们在 OpenSCAD 中构建万神殿(Pantheon)。ModelRift 为平台上的每个 3D 模型生成 OpenSCAD 代码。大模型(LLM)处理空间几何的能力直接影响我们的交付成果,因此我们持续追踪模型在此类任务中的改进情况。

The goal was to see how well each system could turn architectural reference material into parametric CAD code, using the OpenSCAD CLI to render previews and iterate. The prompt was intentionally visual and architectural: build the Pantheon from reference images, including the rotunda, dome, portico, columns, pediment, and recognizable front details. 测试目标是观察各系统将建筑参考资料转化为参数化 CAD 代码的能力,并利用 OpenSCAD 命令行界面(CLI)进行预览和迭代。提示词特意强调了视觉和建筑属性:根据参考图像构建万神殿,包括圆形大厅、穹顶、门廊、圆柱、三角楣饰以及可辨识的正面细节。

Overview of the six current benchmark results. Each thumbnail is labeled with the client and model used for that run. 当前六项基准测试结果概览。每个缩略图都标注了该次运行所使用的客户端和模型。

Why Pantheon? This was not a basic OpenSCAD syntax test. All of the current coding LLMs can produce a simple “cube with a hole” model in OpenSCAD perfectly well. That kind of prompt mostly tests whether the model knows difference(), cube(), and cylinder(). The Pantheon is more useful as a benchmark because it sits in a middle ground. 为什么选择万神殿?这并非基础的 OpenSCAD 语法测试。目前所有的编程大模型都能完美生成简单的“带孔立方体”模型。这类提示词主要测试模型是否掌握 difference()cube()cylinder() 等基础函数。万神殿作为基准测试更有价值,因为它处于一个中间地带。

OpenSCAD is not a good fit for natural sculpted models, organic surfaces, or character-like geometry. It is much better at Boolean operations, radial symmetry, extrusions, and clean constructive shapes. The Pantheon has a large radial rotunda and dome, a central oculus, straight portico faces, columns, stepped bases, and a triangular pediment. That mix makes it illustrative without being impossible. OpenSCAD 不适合处理自然雕塑模型、有机表面或类角色几何体。它在布尔运算、径向对称、拉伸和简洁的构造形状方面表现出色。万神殿拥有巨大的径向圆形大厅和穹顶、中央眼窗、笔直的门廊立面、圆柱、阶梯式基座和三角形楣饰。这种组合使其既具有说明性,又不会难以实现。

It is also recognizable. A weak result still looks vaguely like a domed building, but a better result has to get the relationship between the round drum, the rectangular portico, the dome rings, and the front facade roughly right. 它也具有很高的辨识度。较差的结果看起来可能只是模糊的圆顶建筑,但优秀的结果必须准确把握圆形鼓座、矩形门廊、穹顶环和正面立面之间的比例关系。

Why OpenSCAD? OpenSCAD is a strong target for LLM-generated geometry because the model is plain text code with a compact vocabulary. An agent can describe a building as nested transformations, Boolean operations, cylinders, extrusions, loops, and named modules. That is much closer to how language models already reason about structure than asking them to drive a 3D application through UI actions. 为什么选择 OpenSCAD?OpenSCAD 是大模型生成几何体的理想目标,因为模型本身就是词汇简洁的纯文本代码。智能体可以将建筑描述为嵌套变换、布尔运算、圆柱体、拉伸、循环和命名模块。这比要求模型通过 UI 操作来驱动 3D 应用程序更符合语言模型处理结构逻辑的方式。

This is the main reason we built ModelRift around OpenSCAD in the first place, as covered in Why we built ModelRift on OpenSCAD. That matters for complex geometry. With OpenSCAD, the LLM can say “make 28 repeated columns around a radius” or “subtract an oculus from a dome” directly in the source. The result is inspectable, reproducible, and easy to revise. If a column spacing is wrong, the fix is usually a parameter or loop change, not a hidden scene-state mutation. 这也是我们最初围绕 OpenSCAD 构建 ModelRift 的主要原因,正如《我们为何基于 OpenSCAD 构建 ModelRift》一文所述。这对复杂几何体至关重要。使用 OpenSCAD,大模型可以直接在源码中编写“沿半径重复生成 28 根圆柱”或“从穹顶中减去一个眼窗”。结果是可检查、可复现且易于修改的。如果圆柱间距有误,通常只需修改参数或循环,而不是处理隐藏的场景状态变更。

That same text-first structure is what makes OpenSCAD work well with parametric UI layers like the ones discussed in Building a better OpenSCAD customizer. Blender MCPs and similar tool-control approaches are useful for some workflows, but they are a less natural encoding for this benchmark. The agent has to translate architectural intent into a sequence of application operations, then keep a mental model of the scene state as those operations accumulate. For CAD-like tasks, that is a lot of indirection. 这种“文本优先”的结构使得 OpenSCAD 能与参数化 UI 层良好协作,正如《构建更好的 OpenSCAD 定制器》中所讨论的那样。Blender MCP 及类似的工具控制方法在某些工作流中很有用,但对于本基准测试而言,它们的编码方式不够自然。智能体必须将建筑意图转化为一系列应用程序操作,并在操作累积过程中保持对场景状态的心理模型。对于 CAD 类任务,这涉及过多的间接步骤。

OpenSCAD keeps the geometry itself as the artifact. The tradeoff is that OpenSCAD is not a sculpting tool. It is best at constructive, parametric, and mostly hard-surface objects. The Pantheon sits right in that zone: radial symmetry, repeated columns, rings, cutouts, and simple architectural solids. It also maps cleanly to the practical file-output side of 3D printing: STL remains the baseline mesh format, while 3MF can carry richer assembly and color information, as described in 3D file formats explained and How we added multicolor 3MF export to ModelRift. That is why it is a useful benchmark for the kind of geometry ModelRift wants LLMs to generate. OpenSCAD 将几何体本身作为产物。其代价是 OpenSCAD 并非雕刻工具。它最擅长构造性、参数化和硬表面对象。万神殿恰好处于这一领域:径向对称、重复的圆柱、圆环、切口和简单的建筑实体。它也能很好地映射到 3D 打印的实际文件输出端:STL 依然是基准网格格式,而 3MF 可以携带更丰富的装配和颜色信息,正如《3D 文件格式详解》和《我们如何为 ModelRift 添加多色 3MF 导出功能》中所述。这就是为什么它对于 ModelRift 希望大模型生成的几何类型来说是一个有用的基准。

Prompt

提示词

The prompt used for the benchmark was: see two ref images and build .scad file with openscad implementation of pantheon. use openscad CLI (available) to preview your work (by rendering openscad model to .png) and iterate until you are happy with the result. 基准测试使用的提示词为:查看两张参考图像,并使用 OpenSCAD 实现万神殿的 .scad 文件。使用(可用的)OpenSCAD CLI 预览你的工作(通过将 OpenSCAD 模型渲染为 .png),并不断迭代直到你对结果满意为止。

Reference Images

参考图像

Reference #1 is the front facade view on the left. Reference #2 is the aerial/top view on the right. The combined image was generated with ffmpeg from the two source images used in the benchmark. 参考图 #1 是左侧的正面立面视图。参考图 #2 是右侧的俯视/顶视图。组合图像是使用 ffmpeg 从基准测试中使用的两张源图像生成的。

Results

测试结果

The six current benchmark outputs, labeled by client and model. 当前六项基准测试输出,按客户端和模型标注。

(Table omitted for brevity, please refer to the original article for the specific performance metrics.) (为简洁起见省略表格,具体性能指标请参考原文。)

The scores are relative to this benchmark only. They are not general model rankings, and the time score reflects observed implementation. 以上评分仅针对本次基准测试。它们并非模型的通用排名,时间评分反映了观察到的实现耗时。