AI Tools & Products
Apr 28, 2026
Microsoft Launches Open-Source VibeVoice AI Models for Speech Recognition and Synthesis
Apr 28, 2026
AI Summary
Microsoft has introduced VibeVoice, a suite of open-source AI models for speech recognition and text-to-speech applications. The models support long-form audio processing and include features like speaker tracking and customizable hotwords, aiming to enhance accuracy and usability in various audio applications.
- VibeVoice includes both Automatic Speech Recognition (ASR) and Text-to-Speech (TTS) models, designed for open-source collaboration in the speech synthesis community.
- VibeVoice-ASR can process up to 60 minutes of continuous audio in a single pass, producing structured transcriptions that include speaker identification, timestamps, and content.
- The ASR model allows for customized hotwords to improve recognition accuracy for specific terms or names.
- VibeVoice-TTS can generate speech for up to 90 minutes, supporting multiple speakers and maintaining natural conversational dynamics.
- The models utilize continuous speech tokenizers and a next-token diffusion framework for enhanced audio fidelity and computational efficiency.
- Microsoft has emphasized the importance of responsible AI use, warning against potential misuse for creating deepfakes or disinformation.
- The models are intended for research and development purposes, with a recommendation against commercial use without further testing.
voice aiopen sourcemicrosofttechnologyinnovation