OpenAI unveils real-time voice models as it readies post-smartphone device

OpenAI logo./Courtesy of OpenAI

OpenAI, the developer of ChatGPT, unveiled three new voice artificial intelligence (AI) models. OpenAI is preparing a next-generation AI device to follow the smartphone, and it appears to be advancing the voice AI models needed to operate that device.

On the 7th (local time), OpenAI introduced three voice models: "GPT-Realtime-2," which can handle complex requests based on GPT-5–level reasoning; "GPT-Realtime-Translate," which translates speech in real time; and "GPT-Realtime-Whisper," which converts speech to text in real time.

OpenAI said, "Voice is becoming the most natural way to use software," explaining the background of the development. For example, in situations such as giving directions while driving or needing to send an email, voice technology must be advanced to continue tasks without using hands.

The company said, "Simply having fast response times or natural-sounding voices is not enough," adding, "We are advancing real-time voice technology so it can go beyond simple Q&A to listen, reason, translate, take dictation, and get things done in line with the flow of conversation."

For "GPT-Realtime-2," a key feature is that it is designed to respond immediately even if a user interrupts while the AI is speaking or corrects something said earlier in the middle. Unlike previous AI models that required the user and AI to speak in turns, it enables natural conversations that feel like talking to a real person.

The company said the model is currently being piloted by the real estate platform Zillow, the travel platform Priceline, and the telecom company Deutsche Telekom. Zillow is building a voice assistant that searches listings and schedules visits based on conditions set by the customer by voice, and Deutsche Telekom is experimenting with a real-time translation service for customer support.

OpenAI is also expected to use voice models in its own AI device. After acquiring the startup io, founded by Jony Ive, who led product design at Apple, for $6.5 billion last year, OpenAI has been preparing an AI device operable by voice. Major foreign media outlets forecast the device could be smart glasses, a pin-style smart gadget attachable to clothing, or a smart speaker.

※ This article has been translated by AI. Share your feedback here.