Google Launches Highly Controllable AI Voice Model
Google DeepMind has released Gemini 3.1 Flash TTS, a text-to-speech model offering advanced control over vocal style, tone, pace, and regional accents across more than 70 languages. Users can direct delivery through text commands, choosing from styles like enthusiastic or informative, with accents ranging from American Southern to British Brixton.
The model includes director-level controls, format templates such as podcast and audiobook styles, and SynthID watermarking on all outputs. It ranked second on the Artificial Analysis TTS leaderboard with a score of 1211. It is available via the Gemini API, Google AI Studio, Vertex AI, and Google Vids.
