AI OSS tool repo goes archived over night after raising $7.3M Seed

AI 开源工具库在获得 730 万美元种子轮融资后连夜归档

TensorZero is an open-source LLMOps platform that unifies: Gateway: access every LLM provider through a unified API, built for performance (<1ms p99 latency) Observability: store inferences and feedback in your database, available programmatically or in the UI Evaluation: benchmark individual inferences or end-to-end workflows using heuristics, LLM judges, etc. Optimization: collect metrics and human feedback to optimize prompts, models, and inference strategies Experimentation: ship with confidence with built-in A/B testing, routing, fallbacks, retries, etc.

TensorZero 是一个开源的 LLMOps 平台，集成了以下功能：网关（Gateway）：通过统一 API 访问所有 LLM 提供商，专为高性能设计（p99 延迟 <1ms）；可观测性（Observability）：将推理结果和反馈存储在您的数据库中，可通过编程或 UI 访问；评估（Evaluation）：使用启发式方法、LLM 裁判等对单个推理或端到端工作流进行基准测试；优化（Optimization）：收集指标和人工反馈以优化提示词、模型和推理策略；实验（Experimentation）：通过内置的 A/B 测试、路由、故障转移、重试等功能，自信地发布产品。

You can take what you need, adopt incrementally, and complement with other tools. It plays nicely with the OpenAI SDK, OpenTelemetry, and every major LLM provider. TensorZero is used by companies ranging from frontier AI startups to the Fortune 10 and fuels ~1% of global LLM API spend today.

您可以按需取用、逐步采用，并与其他工具配合使用。它与 OpenAI SDK、OpenTelemetry 以及所有主流 LLM 提供商兼容良好。TensorZero 的用户涵盖了从前沿 AI 初创公司到财富 10 强企业，目前支撑着全球约 1% 的 LLM API 调用支出。

🆕 TensorZero Autopilot

TensorZero Autopilot is an automated AI engineer powered by TensorZero that analyzes LLM observability data, sets up evals, optimizes prompts and models, and runs A/B tests. It dramatically improves the performance of LLM agents across diverse tasks.

TensorZero Autopilot 是一个由 TensorZero 驱动的自动化 AI 工程师，它能够分析 LLM 可观测性数据、设置评估、优化提示词和模型，并运行 A/B 测试。它能显著提升 LLM 智能体在各种任务中的表现。

🌐 LLM Gateway

🌐 LLM 网关

Integrate with TensorZero once and access every major LLM provider. Call any LLM (API or self-hosted) through a single unified API. Infer with tool use, structured outputs (JSON), batch, embeddings, multimodal (images, files), caching, etc. Satisfy extreme throughput and latency needs, thanks to 🦀 Rust: <1ms p99 latency overhead at 10k+ QPS.

只需集成一次 TensorZero，即可访问所有主流 LLM 提供商。通过单一统一 API 调用任何 LLM（API 或自托管）。支持工具调用、结构化输出 (JSON)、批处理、嵌入、多模态（图像、文件）、缓存等推理功能。得益于 🦀 Rust 语言，它能满足极高的吞吐量和延迟需求：在 1 万+ QPS 下，p99 延迟开销小于 1ms。

🔍 LLM Observability

🔍 LLM 可观测性

Zoom in to debug individual API calls, or zoom out to monitor metrics across models and prompts over time — all using the open-source TensorZero UI. Store inferences and feedback (metrics, human edits, etc.) in your own database. Dive into individual inferences or high-level aggregate patterns using the TensorZero UI or programmatically.

使用开源的 TensorZero UI，您可以深入调试单个 API 调用，或纵观全局监控不同模型和提示词随时间变化的指标。将推理结果和反馈（指标、人工编辑等）存储在您自己的数据库中。通过 TensorZero UI 或编程方式，深入分析单个推理过程或高层聚合模式。

📈 LLM Optimization

📈 LLM 优化

Send production metrics and human feedback to easily optimize your prompts, models, and inference strategies — using the UI or programmatically. Optimize your models with supervised fine-tuning, RLHF, and other techniques. Optimize your prompts with automated prompt engineering algorithms like GEPA.

通过 UI 或编程方式发送生产指标和人工反馈，轻松优化您的提示词、模型和推理策略。利用监督微调 (SFT)、RLHF 等技术优化模型。使用 GEPA 等自动化提示词工程算法优化提示词。

📊 LLM Evaluation

📊 LLM 评估

Compare prompts, models, and inference strategies using evaluations powered by heuristics and LLM judges. Evaluate individual inferences with inference evaluations powered by heuristics or LLM judges (≈ unit tests for LLMs). Evaluate end-to-end workflows with workflow evaluations with complete flexibility (≈ integration tests for LLMs).

使用由启发式方法和 LLM 裁判驱动的评估工具来比较提示词、模型和推理策略。通过启发式方法或 LLM 裁判进行推理评估，以评估单个推理结果（类似于 LLM 的单元测试）。通过工作流评估，以极高的灵活性评估端到端工作流（类似于 LLM 的集成测试）。

🧪 LLM Experimentation

🧪 LLM 实验

Ship with confidence with built-in A/B testing, routing, fallbacks, retries, etc. Run adaptive A/B tests to ship with confidence and identify the best prompts and models for your use cases. Enforce principled experiments in complex workflows, including support for multi-turn LLM systems, sequential testing, and more.

利用内置的 A/B 测试、路由、故障转移、重试等功能，自信地发布产品。运行自适应 A/B 测试，以确定最适合您用例的提示词和模型。在复杂工作流中执行原则性的实验，包括支持多轮 LLM 系统、顺序测试等。