Microsoft Launches Open-Source VibeVoice AI Models for Speech Recognition and Synthesis

Apr 28, 2026

AI Summary

Microsoft has introduced VibeVoice, a suite of open-source AI models for speech recognition and text-to-speech applications. The models support long-form audio processing and include features like speaker tracking and customizable hotwords, aiming to enhance accuracy and usability in various audio applications.

VibeVoice includes both Automatic Speech Recognition (ASR) and Text-to-Speech (TTS) models, designed for open-source collaboration in the speech synthesis community.
VibeVoice-ASR can process up to 60 minutes of continuous audio in a single pass, producing structured transcriptions that include speaker identification, timestamps, and content.
The ASR model allows for customized hotwords to improve recognition accuracy for specific terms or names.
VibeVoice-TTS can generate speech for up to 90 minutes, supporting multiple speakers and maintaining natural conversational dynamics.
The models utilize continuous speech tokenizers and a next-token diffusion framework for enhanced audio fidelity and computational efficiency.
Microsoft has emphasized the importance of responsible AI use, warning against potential misuse for creating deepfakes or disinformation.
The models are intended for research and development purposes, with a recommendation against commercial use without further testing.

voice aiopen sourcemicrosofttechnologyinnovation

Microsoft Launches Open-Source VibeVoice AI Models for Speech Recognition and Synthesis

Related Stories

Exploring the Relevance of Python in an AI-Driven Coding Landscape

Digg Relaunches as AI-Focused News Aggregator After Previous Shutdown

Nvidia introduces CUDA-oxide, a Rust-to-CUDA compiler for GPU programming

OpenAI Launches Campus Network for Student Clubs