Back to news
AI Tools & Products
Apr 28, 2026

Microsoft Launches Open-Source VibeVoice AI Models for Speech Recognition and Synthesis

Apr 28, 2026
AI Summary

Microsoft has introduced VibeVoice, a suite of open-source AI models for speech recognition and text-to-speech applications. The models support long-form audio processing and include features like speaker tracking and customizable hotwords, aiming to enhance accuracy and usability in various audio applications.

  • VibeVoice includes both Automatic Speech Recognition (ASR) and Text-to-Speech (TTS) models, designed for open-source collaboration in the speech synthesis community.
  • VibeVoice-ASR can process up to 60 minutes of continuous audio in a single pass, producing structured transcriptions that include speaker identification, timestamps, and content.
  • The ASR model allows for customized hotwords to improve recognition accuracy for specific terms or names.
  • VibeVoice-TTS can generate speech for up to 90 minutes, supporting multiple speakers and maintaining natural conversational dynamics.
  • The models utilize continuous speech tokenizers and a next-token diffusion framework for enhanced audio fidelity and computational efficiency.
  • Microsoft has emphasized the importance of responsible AI use, warning against potential misuse for creating deepfakes or disinformation.
  • The models are intended for research and development purposes, with a recommendation against commercial use without further testing.
voice aiopen sourcemicrosofttechnologyinnovation