Nemotron 3.5 Content Safety: Customizable Multimodal Safety for Global Enterprise AI

Nemotron 3.5 Content Safety: Customizable Multimodal Safety for Global Enterprise AI

Nemotron 3.5 内容安全:面向全球企业级 AI 的可定制多模态安全方案

The last two years have seen NVIDIA’s content safety stack grow from a focused English text classifier into a family of specialized models—each extending coverage to new modalities, languages, and inference modes. Nemotron 3 Content Safety, released in March 2026, combined multimodal and multilingual capabilities for the first time in a single 4B-parameter model. Today, we are releasing Nemotron 3.5 Content Safety, which completes that arc: a single model that unifies multimodal input, multilingual reach, custom enterprise policy enforcement, and auditable reasoning into one inference call. This post covers what changes in 3.5, the design decisions behind each new capability, and how to integrate the model into production safety pipelines.

在过去两年中,NVIDIA 的内容安全技术栈已从单一的英语文本分类器发展为一个专业模型系列,每个模型都在不断扩展对新模态、语言和推理模式的覆盖。2026 年 3 月发布的 Nemotron 3 Content Safety 首次在一个 40 亿参数(4B)模型中结合了多模态和多语言能力。今天,我们发布了 Nemotron 3.5 Content Safety,它完成了这一演进:通过单次推理调用,将多模态输入、多语言覆盖、定制化企业策略执行以及可审计的推理过程统一起来。本文将介绍 3.5 版本的变化、各项新功能背后的设计决策,以及如何将该模型集成到生产环境的安全流水线中。

What’s New in Nemotron 3.5 Content Safety

Nemotron 3.5 Content Safety 的新特性

1. Unified Multimodal Evaluation Nemotron 3 introduced image understanding; Nemotron 3.5 deepens the multimodal integration. The model takes a user prompt, an optional image, and an optional assistant response as a single context window and produces a coherent safety verdict over the combined input. Evaluating all three together—rather than scoring each independently—closes a well-known gap in multimodal safety scenarios: policy violations that only emerge from the interaction between text and image, or between request and response, are now caught in a single pass.

1. 统一的多模态评估 Nemotron 3 引入了图像理解能力,而 Nemotron 3.5 则进一步深化了多模态集成。该模型将用户提示词、可选图像以及可选的助手回复作为一个单一的上下文窗口进行处理,并对组合后的输入给出连贯的安全判定。通过对三者进行联合评估(而非独立评分),解决了多模态安全场景中一个众所周知的痛点:那些仅在文本与图像之间,或请求与响应之间的交互中才会出现的策略违规行为,现在可以在单次处理中被捕捉到。

2. Global Language Coverage Nemotron 3.5 maintains the 12-language explicit training coverage of its predecessors—English, French, Spanish, German, Chinese, Japanese, Korean, Arabic, Hindi, Russian, Portuguese, and Italian—while also inheriting strong zero-shot generalization across approximately 140 languages from the Gemma 3 base model. This means deployments in markets where training data is sparse (e.g., Southeast Asian languages, Scandinavian languages, less-resourced African languages) benefit from base-model multilingual transfer without requiring separate fine-tuning.

2. 全球语言覆盖 Nemotron 3.5 保持了前代产品对 12 种语言(英语、法语、西班牙语、德语、中文、日语、韩语、阿拉伯语、印地语、俄语、葡萄牙语和意大利语)的显式训练覆盖,同时继承了 Gemma 3 基础模型在大约 140 种语言上的强大零样本泛化能力。这意味着在训练数据稀缺的市场(如东南亚语言、斯堪的纳维亚语言、资源较少的非洲语言)中,部署该模型无需进行额外的微调,即可受益于基础模型的多语言迁移能力。

3. Custom Policy Enforcement This is the most significant architectural addition in 3.5 relative to Nemotron 3. Production deployments rarely operate under a single universal safety taxonomy. A healthcare platform has a different risk profile than a financial services chatbot, a developer tools IDE, or a children’s education app. Nemotron 3.5 accepts a custom policy specification alongside the input. The model reasons over that policy when producing its verdict rather than deferring entirely to the built-in taxonomy. This extends the work first introduced in Nemotron Content Safety Reasoning 4B to the full multimodal, multilingual setting.

3. 定制化策略执行 这是 3.5 版本相对于 Nemotron 3 最重要的架构改进。生产环境的部署很少仅遵循单一的通用安全分类体系。医疗保健平台与金融服务聊天机器人、开发者工具 IDE 或儿童教育应用面临的风险特征各不相同。Nemotron 3.5 允许在输入的同时指定自定义策略。模型在得出判定结果时会基于该策略进行推理,而不是完全依赖内置的分类体系。这扩展了 Nemotron Content Safety Reasoning 4B 中首次引入的功能,并将其应用于完整的全多模态、多语言场景。

4. Reasoning Traces (THINK Mode) Every safety verdict in Nemotron 3.5 can be accompanied by an auditable reasoning trace via an optional think mode. When enabled, the model outputs its step-by-step reasoning before delivering a final safe / unsafe label and, optionally, the violated categories.

4. 推理轨迹(THINK 模式) Nemotron 3.5 中的每一项安全判定都可以通过可选的“思考模式”(THINK mode)附带可审计的推理轨迹。启用后,模型会在给出最终的“安全/不安全”标签以及可选的违规类别之前,输出其逐步推理的过程。

(Example Trace / 推理示例) <think> The user prompt asks for guidance on acquiring a controlled substance without a prescription. The assistant response provides specific sourcing steps and references an online marketplace. This interaction violates the Criminal Planning/Confessions and Controlled Substances categories. The image (a pharmacy exterior) provides locational context but does not alter the verdict. </think> <think> 用户提示词要求获取无需处方的管制药物指南。助手回复提供了具体的采购步骤并引用了一个在线市场。此交互违反了“犯罪策划/供述”和“管制物质”类别。图像(药店外观)提供了位置背景,但并未改变判定结果。 </think>

When latency is the primary constraint, THINK mode can be disabled to return to the same low-latency binary verdict available in Nemotron 3.

当延迟是首要限制因素时,可以禁用 THINK 模式,以恢复到 Nemotron 3 中提供的低延迟二元判定模式。

5. Safety Dataset With Nemotron 3.5, we are releasing our safety dataset. This is an important milestone since most OSS safety models don’t generally provide the training or evaluation sets. This problem is worse for the multimodal space where artifacts such as images or videos are often derived from resources with restrictive licensing terms. The Nemotron 3.5 Content Safety Dataset is multimodal, multilingual, and includes safety reasoning traces that were used to train the model.

5. 安全数据集 随着 Nemotron 3.5 的发布,我们同时公开了我们的安全数据集。这是一个重要的里程碑,因为大多数开源安全模型通常不提供训练或评估集。在多模态领域,这个问题更为严重,因为图像或视频等素材往往源自具有严格许可条款的资源。Nemotron 3.5 内容安全数据集是多模态、多语言的,并包含了用于训练模型的安全推理轨迹。

Model Architecture

模型架构

Nemotron 3.5 Content Safety is built on Google Gemma 3 4B IT (4B parameters), providing a 128K context window, strong vision-language reasoning, and broad multilingual coverage. NVIDIA fine-tunes this base with a LoRA adapter that installs targeted safety classification behavior while keeping the model compact enough for real-time deployment on 8GB+ VRAM GPUs.

Nemotron 3.5 Content Safety 基于 Google Gemma 3 4B IT(40 亿参数)构建,提供 128K 上下文窗口、强大的视觉语言推理能力和广泛的多语言覆盖。NVIDIA 通过 LoRA 适配器对该基础模型进行微调,植入针对性的安全分类行为,同时保持模型足够轻量,可在 8GB+ 显存的 GPU 上实现实时部署。

The safety taxonomy follows the Aegis 2.0 framework: 13 core categories aligned with the MLCommons safety taxonomy, plus 10 fine-grained subcategories.

安全分类体系遵循 Aegis 2.0 框架:包含与 MLCommons 安全分类体系对齐的 13 个核心类别,以及 10 个细粒度子类别。

Reasoning

推理能力

Reasoning is a supercharger for content safety classification because it provides the necessary context, customization, and accountability required for production AI systems, especially in enterprise and regulated environments. Enables Custom and Contextual Policy Enforcement: Reasoning allows a content safety model to dynamically interpret and enforce custom, domain-specific policies defined in natural language.

推理能力是内容安全分类的“增压器”,因为它为生产级 AI 系统(特别是在企业和受监管环境中)提供了必要的上下文、定制化和可追溯性。它实现了定制化和情境化的策略执行:推理能力使内容安全模型能够动态地解释并执行以自然语言定义的自定义领域特定策略。