OpenAI launches new voice intelligence features in its API

OpenAI launches new voice intelligence features in its API

OpenAI 在其 API 中推出全新的语音智能功能

OpenAI said Thursday that its API will now include a number of new voice intelligence features designed to help developers create apps that can talk, transcribe, and translate conversations with users. OpenAI 周四表示,其 API 现在将包含一系列新的语音智能功能,旨在帮助开发者创建能够与用户进行对话、转录和翻译的应用程序。

The company’s new GPT‑Realtime‑2 is another voice model, built to create a realistic vocal simulation that can converse with users. However, unlike its predecessor (GPT-Realtime-1.5) this one is built with GPT‑5‑class reasoning that OpenAI says was created to deal with more complicated requests from users. 该公司推出的全新 GPT-Realtime-2 是一款新的语音模型,旨在创建能够与用户进行对话的逼真语音模拟。然而,与前代产品(GPT-Realtime-1.5)不同的是,该模型采用了 GPT-5 级别的推理能力,OpenAI 表示这是为了处理用户更复杂的请求而设计的。

The company is also launching GPT‑Realtime‑Translate which, just as it sounds, is designed to provide real-time translation services that “keep pace” with the user, conversationally. The feature includes more than 70 input languages (that is, the languages that it can comprehend) and 13 output languages (the languages it relays to the speaker). 该公司还推出了 GPT-Realtime-Translate,顾名思义,它旨在提供能够与用户对话“同步”的实时翻译服务。该功能支持超过 70 种输入语言(即它能理解的语言)和 13 种输出语言(即它向说话者传达的语言)。

Finally, the company has also launched a new transcription capability, GPT-Realtime-Whisper, which gives users live speech-to-text capabilities that are captured as interactions occur. “Together, the models we are launching move real-time audio from simple call-and-response toward voice interfaces that can actually do work: listen, reason, translate, transcribe, and take action as a conversation unfolds,” the company said. 最后,该公司还推出了一项新的转录功能 GPT-Realtime-Whisper,为用户提供在交互过程中实时捕捉语音转文字的能力。该公司表示:“我们推出的这些模型共同将实时音频从简单的呼叫响应,提升为能够真正开展工作的语音界面:在对话展开的同时进行倾听、推理、翻译、转录并采取行动。”

Who will these updates be good for? Companies that want to expand customer service capabilities are an obvious target. However, OpenAI also notes that its new features will assist with a wide array of areas, including education, media, events, and creator platforms, among others. 这些更新对谁有利?希望扩展客户服务能力的公司显然是主要目标。不过,OpenAI 也指出,其新功能将为广泛的领域提供帮助,包括教育、媒体、活动和创作者平台等。

As useful as these tools seem from an enterprise perspective, it also seems plausible that they could be misused. The company said it has built guardrails to stop its new features from being abused to create spam, fraud, or other forms of online abuse. Certain triggers have been embedded in the system so that “conversations can be halted if they are detected as violating our harmful content guidelines,” OpenAI said. 尽管从企业角度来看这些工具非常有用,但它们也可能被滥用。该公司表示,已建立防护机制,以防止其新功能被滥用于制造垃圾邮件、欺诈或其他形式的网络滥用。OpenAI 表示,系统中已嵌入特定的触发器,以便“如果检测到对话违反了我们的有害内容准则,可以立即停止对话”。

All of the new voice models are included in OpenAI’s Realtime API. Translate and Whisper are billed by the minute, while GPT-Realtime-2 is billed by token consumption. 所有新的语音模型都包含在 OpenAI 的 Realtime API 中。Translate 和 Whisper 按分钟计费,而 GPT-Realtime-2 则按 Token 消耗量计费。