Microsoft Debuts Three Fast AI Media Models
Microsoft launched three new AI models — MAI-Image-2, MAI-Transcribe-1, and MAI-Voice-1 — optimized for image generation, speech transcription, and synthetic voice output. Available through Azure's Microsoft Foundry, MAI-Image-2 is twice as fast as its predecessor, while MAI-Transcribe-1 transcribes speech 2.5 times faster with a 3.9% word error rate across 25 languages, beating rivals from Google and OpenAI.
MAI-Voice-1 generates synthetic speech from user scripts with customizable voices. Pricing starts at $5 per million input tokens for MAI-Image-2, $0.36 per transcription hour, and $22 per million characters for voice. Microsoft is rolling out the models across Bing, PowerPoint, and Copilot Audio Expressions.
