Google unveiled a voice artificial intelligence (AI) model that can interpret and translate more than 70 languages in real time.
Google said on the 9th (local time) that it will apply Gemini 3.5 Live Translate, based on its latest AI model Gemini 3.5, to its video conferencing platform Google Meet and its mobile translation application (app).
The company said the new translation model has shifted from the traditional turn-by-turn approach, which waits for the user to finish speaking before translating, to a continuous real-time generation method closer to simultaneous interpretation. It performs in real time the process of listening to the user, translating, and then delivering the translation as speech immediately. Google said the latency is just a few seconds, enabling a flow similar to an actual conversation.
The model supports more than 70 languages and automatically recognizes and translates the language a user speaks during a conversation. It can also be used in multilingual conversations where several languages are mixed.
It also improved voice quality. Instead of producing a mechanical voice, it preserves the user's intonation, speaking style, and pitch as much as possible to deliver a natural-sounding voice. Google said, "As a result, the translated voice sounds natural, and it is easier to understand the conversation."
Google said the model is designed to work in real-world environments. It can handle noisy places with a lot of background sound, situations where several people speak at once, and colloquial expressions. It is intended for use across various fields, including school classes, tourism services, customer service phone support, ride-hailing services, and live broadcasts.
Previously, to use voice interpretation in the Google Translate app on iPhones and Android phones, users had to plug in earphones, but now they can hold the smartphone to their ear and hear the translated voice even without earphones, as if on a call.