PhyDrawGen: Physically Grounded Diagram Generation from Natural Language
PhyDrawGen: Physically Grounded Diagram Generation from Natural Language
PhyDrawGen:基于自然语言的物理基础图表生成
Abstract: Generating physics diagrams from text requires strict adherence to physical laws. While current generative models produce visually plausible outputs, they systematically hallucinate force vectors, ignore conservation laws, and violate geometric constraints.
摘要: 从文本生成物理图表需要严格遵守物理定律。虽然当前的生成模型能够产生视觉上合理的输出,但它们往往会系统性地虚构力矢量、忽略守恒定律,并违反几何约束。
We present PhyDrawGen, a neuro-symbolic pipeline that decouples semantic scene understanding from physical constraint satisfaction. First, a large language model extracts a typed scene graph from the problem text. A deterministic solver then converts this graph into a Planar Straight-Line Graph (PSLG), encoding force balance, optical paths, and field topologies as exact geometric primitives.
我们提出了 PhyDrawGen,这是一个神经符号流水线,它将语义场景理解与物理约束满足分离开来。首先,大语言模型从问题文本中提取类型化的场景图。随后,确定性求解器将该图转换为平面直线图(PSLG),将力平衡、光路和场拓扑编码为精确的几何基元。
Finally, a fine-tuned Qwen-VL model implements a visually grounded propose-verify loop to iteratively correct any constraint violations. Evaluated on a benchmark of 1,449 problems spanning mechanics, optics, and electromagnetism, PhyDrawGen significantly outperforms GPT-5-image, Gemini 2.5 Flash, and Gemini 3 Pro, demonstrating robust physical accuracy even on unusual-object problems.
最后,经过微调的 Qwen-VL 模型实现了一个视觉基础的“提出-验证”循环,以迭代方式纠正任何约束违规。在涵盖力学、光学和电磁学的 1,449 个问题的基准测试中,PhyDrawGen 的表现显著优于 GPT-5-image、Gemini 2.5 Flash 和 Gemini 3 Pro,即使在处理非常规物体的问题上,也展现出了稳健的物理准确性。