Robbyant / lingbot-map

Robbyant / lingbot-map

LingBot-Map: Geometric Context Transformer for Streaming 3D Reconstruction. LingBot-Map:用于流式 3D 重建的几何上下文 Transformer。

Meet LingBot-Map! We’ve built a feed-forward 3D foundation model for streaming 3D reconstruction! 认识一下 LingBot-Map!我们构建了一个用于流式 3D 重建的前馈 3D 基础模型!

LingBot-Map has focused on: LingBot-Map 的核心重点包括:

  • Geometric Context Transformer: Architecturally unifies coordinate grounding, dense geometric cues, and long-range drift correction within a single streaming framework through anchor context, pose-reference window, and trajectory memory. 几何上下文 Transformer: 通过锚点上下文、位姿参考窗口和轨迹记忆,在单一流式框架内从架构上统一了坐标定位、密集几何线索和长距离漂移校正。

  • High-Efficiency Streaming Inference: A feed-forward architecture with paged KV cache attention, enabling stable inference at ~20 FPS on 518×378 resolution over long sequences exceeding 10,000 frames. 高效流式推理: 采用带有分页 KV 缓存注意力机制的前馈架构,能够在超过 10,000 帧的长序列上,以 518×378 分辨率实现约 20 FPS 的稳定推理。

  • State-of-the-Art Reconstruction: Superior performance on diverse benchmarks compared to both existing streaming and iterative optimization-based approaches. 最先进的重建效果: 与现有的流式方法和基于迭代优化的方法相比,在多个基准测试中表现出卓越的性能。


📰 News (新闻)

2026-05-25 — 📊 Evaluation benchmark released. We released the evaluation scripts for KITTI and Oxford Spires — see benchmark/ for the pipeline, and run preprocess/oxford.py to prepare Oxford Spires data before evaluation. 2026-05-25 — 📊 评估基准发布。我们发布了 KITTI 和 Oxford Spires 的评估脚本——请查看 benchmark/ 获取流程,并运行 preprocess/oxford.py 在评估前准备 Oxford Spires 数据。

2026-04-29 — 📹 Long-video demo released. We released a very-long-video example (~25 000 frames, 13-minute indoor walkthrough) rendered with the offline pipeline — see Worked Example for the command, flag rationale, and rendered output. 2026-04-29 — 📹 长视频演示发布。我们发布了一个超长视频示例(约 25,000 帧,13 分钟的室内漫游),该视频通过离线流水线渲染——请参阅“Worked Example”获取命令、参数说明及渲染输出。

2026-04-27 — 🚀 LingBot-Map accelerated. Pull the latest main and run python demo.py --compile ... or python gct_profile.py --backend flashinfer --dtype bf16 --compile to verify on your hardware. 2026-04-27 — 🚀 LingBot-Map 加速。拉取最新的 main 分支并运行 python demo.py --compile ...python gct_profile.py --backend flashinfer --dtype bf16 --compile 以在您的硬件上进行验证。

2026-04-24 — Fixed a FlashInfer KV cache bug where --keyframe_interval > 1 silently cached non-keyframes. You should now see better pose and reconstruction quality when running with more than 320 frames. 2026-04-24 — 修复了一个 FlashInfer KV 缓存错误,该错误会导致 --keyframe_interval > 1 时静默缓存非关键帧。现在,在运行超过 320 帧时,您应该能看到更好的位姿和重建质量。


⚙️ Installation (安装)

  1. Create conda environment 创建 conda 环境

    conda create -n lingbot-map python=3.10 -y
    conda activate lingbot-map
  2. Install PyTorch (CUDA 12.8) 安装 PyTorch (CUDA 12.8)

    pip install torch==2.8.0 torchvision==0.23.0 --index-url https://download.pytorch.org/whl/cu128

    PyTorch 2.8.0 is the recommended version because NVIDIA Kaolin (required by the batch rendering pipeline) has prebuilt wheels for torch-2.8.0_cu128. If you only need demo.py you may use a newer PyTorch, but the batch renderer then requires building Kaolin from source. For other CUDA versions, see PyTorch Get Started. 推荐使用 PyTorch 2.8.0 版本,因为 NVIDIA Kaolin(批量渲染流水线所需)提供了针对 torch-2.8.0_cu128 的预构建包。如果您仅需要 demo.py,可以使用更新版本的 PyTorch,但批量渲染器则需要从源码编译 Kaolin。对于其他 CUDA 版本,请参阅 PyTorch 入门指南。

  3. Install lingbot-map 安装 lingbot-map

    pip install -e .
  4. Install FlashInfer (recommended) 安装 FlashInfer (推荐) FlashInfer provides paged KV cache attention for efficient streaming inference. It is a pure-Python package that JIT-compiles CUDA kernels on first use. FlashInfer 为高效流式推理提供了分页 KV 缓存注意力机制。它是一个纯 Python 包,会在首次使用时即时编译 CUDA 内核。


🚀 Quick Start (快速开始)

After installation, run your first scene with one command: 安装完成后,通过一条命令运行您的第一个场景:

python demo.py --model_path /path/to/lingbot-map-long.pt \
  --image_folder example/courthouse --mask_sky

This launches an interactive viser viewer at http://localhost:8080. 这将启动一个交互式的 viser 查看器,地址为 http://localhost:8080