Microsoft MAI-Voice-2

Microsoft MAI-Voice-2

Microsoft’s most expressive TTS model yet — voice cloning from short samples, fine-grained emotional control, and consistent voice identity across 15 languages. 这是微软迄今为止表现力最强的文本转语音(TTS)模型——支持通过短音频样本进行语音克隆、精细的情感控制,并在 15 种语言中保持一致的语音特征。

Now live in Azure AI Foundry at $22 per million characters, with integrations rolling out in VSCode, Dynamics 365 Contact Center, and Teams. 该模型现已在 Azure AI Foundry 上线,定价为每百万字符 22 美元,并正逐步集成到 VSCode、Dynamics 365 联络中心和 Teams 中。

For builders shipping voice agents who need production-grade prosody without the OpenAI Realtime API price tag. 该产品专为那些需要生产级韵律效果,但又希望避开 OpenAI Realtime API 高昂成本的语音智能体开发者而设计。