ElevenLabs predicts voice will replace text and drive South Korea's K-content expansion

Mati Staniszewski, CEO of ElevenLabs, sits for an interview with ChosunBiz on the 21st at the JW Marriott Hotel in Seocho-gu, Seoul. He and Piotr Dąbkowski, the chief technology officer and his close friend since high school, co-found ElevenLabs in 2022./Courtesy of ElevenLabs

ElevenLabs is the fastest-growing Start - Up in the global voice artificial intelligence (AI) field. Its corporate value is $6.6 billion (about 9.7 trillion won), and it became a unicorn (an unlisted company valued at 1 trillion won or more) just three years after its founding. It has raised more than $281 million (about 410 billion won) from Sequoia Capital, the world's largest venture capital (VC), Nvidia, Deutsche Telekom, LG Uplus, Naver, and U.S. actor Matthew McConaughey.

ElevenLabs' strength is real-time AI voice synthesis technology that renders diverse voices in more than 70 languages. It goes beyond simple translation and dubbing to generate natural speech that includes laughter, sighs, and breathing during conversation. It has technologies such as TTS (Text-to-Speech), which converts writing into a human voice in real time, STT (speech recognition), voice cloning, AI dubbing, sound effects, music AI, and sound effects.

It started with movies. Four years ago, Poland-born friends Marti Staniszewski (Staniszewski) and Piotr Dabkowski (Dabkowski), who lived in London, felt frustrated while watching a foreign film dubbed in Polish. In Poland, when foreign films are broadcast, a single voice actor reads all the characters' lines in a monotone. With only one voice heard regardless of gender or age, immersion inevitably suffers. Founded together in 2022, ElevenLabs began with the question, "What if there were a technology that could change foreign-language lines in films into Polish in real time, in the actor's own voice?"

On the 21st, in an interview with ChosunBiz, ElevenLabs co-founder and CEO Marti Staniszewski said, "We started ElevenLabs to overcome the limitations of Poland's unique dubbing style, but now our goal is to redefine how people and technology communicate, based on cutting-edge voice AI technology." Staniszewski projected that voice, represented by the human voice, will establish itself as the basic interface for operating future AI devices. While people now mostly use text when directing AI chatbots such as ChatGPT, Staniszewski predicted that in the future, as robots, cars, and wearables come to perfectly understand and respond to human speech, voice communication will become mainstream.

ElevenLabs now has 50 million monthly active users (MAU) and counts 75% of the Fortune 500 as customers. The company says that using voice synthesis technology enables dubbing or narration for video without voice actors, and that when applied to call center operations, AI can answer calls like real agents, saving time and expense.

CEO Staniszewski said, "Korea is adopting AI-based voice technology very quickly," adding, "From broadcasting that spans film, K-drama, and K-pop to games, Korea has a deeply rooted audio-centric content production culture, so the growth potential is significant." The company estimates Korea's voice AI market at 340 billion won. Announcing its entry into Korea, ElevenLabs said it would make Korea a key base for expansion into the Asian market. The following is a Q&A with CEO Staniszewski.

—Why did ElevenLabs choose Korea as its sixth global base?

"Korea is quick to adopt AI and embraces voice AI technology with an open attitude in film, broadcasting, and gaming. Sixty-three percent of Korean office workers use Generative AI routinely, which is twice the global average. It's a market with strong growth potential. Korea also has excellent talent, including researchers and engineers in voice AI. We will continue to hire outstanding Korean researchers and engineers to advance our technology."

—Which sectors in Korea have the highest demand for voice AI technology?

"Currently, demand is highest in film/broadcasting and gaming, and we have carried out many collaborations in these two areas over the past six months. There is much we can do together, from producing content such as K-drama to localizing it into other languages. Our voice synthesis technology is also used in 100% AI films. In gaming, we are working with Krafton Inc. to implement non-player characters (NPCs) that converse and interact with gamers."

—Is the global market similar to Korea?

"Unlike Korea, the global market over the past year has seen the 'customer experience' area, such as call centers, account for a larger share than film or gaming. But as the Korean-language capability of ElevenLabs' TTS model 'Eleven v3' has advanced, we expect demand in the customer experience area to grow in earnest in Korea starting next year.

Korean is a difficult language for implementing AI voice technology, so it took time to develop models that generate Korean text and speech and to raise their completeness. We first completed the technology that converts writing into voice, and once we were confident in the quality, we decided to enter the Korean market. We are now entering the stage of smoothly implementing real-time Korean voice generation technology, so we expect adoption in the customer experience area to increase starting next year."

U.S. actor Matthew McConaughey recently registers his voice on ElevenLabs' Iconic Voices Marketplace for commercial uses including dubbing./Courtesy of ElevenLabs

—Hollywood actors and celebrities have been reluctant to let their voices be cloned by AI. What enabled you to sign deals with Matthew McConaughey and Michael Caine to generate their voices with AI?

"Since our early days, we have considered how artists, including famous actors, and content creators could actively participate in the AI ecosystem. So we created the world's first 'Iconic Voice Marketplace,' where people can clone their voices with AI, share them, and earn revenue when those voices are used. So far, about 10,000 voices have been registered on the marketplace, and we have paid a total of $11 million (about 16.2 billion won) to them. There are about 400 Korean-language voices as well. For famous actors like Michael Caine, their voices are restricted to specific projects only. We believe we were able to forge these partnerships because ElevenLabs built a new voice-based economic model and celebrities recognized they could earn income by using their voices."

—What are you doing to mitigate the so-called 'uncanny valley' phenomenon in voice AI?

"Voices generated by ElevenLabs are now so natural that they are indistinguishable from real human voices, so we have moved past the 'uncanny valley' stage. That said, awkwardness still exists depending on the use case. A representative example is avatars. We do not build avatars ourselves, but we partner with corporations that build avatar technology to provide our voice technology. In such cases, no matter how natural the voice is, when an avatar appears on screen, people think, 'This isn't a person,' and feel a sense of dissonance. Right now, we generate video and audio separately, but we plan to solve this by training on both together and building models that can understand and generate video and audio simultaneously."

—What is ElevenLabs currently focusing on to advance voice AI technology?

"Real-time dubbing and translation, and Conversational AI agents. One of the hardest parts is reducing mistakes in real-time translation while preserving emotion and intonation. The key challenge is how naturally we can implement this. Technology that lets AI voices naturally interject in conversation and respond quickly to a speaker, like in human-to-human dialogue, is also not fully realized yet. We are advancing the technology now, and we expect to achieve this within the next six to 12 months."

—Competition is fierce in voice AI. What differentiates ElevenLabs?

"Our biggest differentiator is that ElevenLabs is a company that does both research and product development. Based on our own foundation models, we have built tools and products that make it easy to use a variety of voice AI technologies, including narration, voice-over, and AI agents. We have also recruited top talent in voice AI worldwide, regardless of location. There are only about 50 to 100 people globally with deep expertise in voice AI, so we could not find them only in London. That is why we introduced a remote work system from the start. Together with outstanding researchers, we plan to build the world's best voice AI models and lead the market."

—You also entered the highly competitive music AI market. What are your plans?

"ElevenLabs differentiates itself by focusing on supporting creation rather than consumption of music. We are not a platform for listening to music like Spotify or Suno; we provide technology that helps creators make and distribute music. We identified that our clients need sound effects and background music, and developed a music AI model that lets them create all of this easily. We built a 100% licensing-based model, and all data used to train the model was obtained through partnerships with record labels."

—How are you responding to the risk that voice AI could be misused, including for voice phishing?

"Because misuse of AI voices, including deepfakes, could become a social problem, our philosophy is to respond with a sense of responsibility. All voice content generated by ElevenLabs models is traceable, so we can take immediate action if problems arise. We also apply fraud detection that analyzes the text or content entered by the voice creator and blocks it immediately if risks are detected. The biggest problem is that we cannot stop commercial models with insufficient safeguards and open-source models with weak security. Therefore, we are working with AI safety institutes in the United States and the United Kingdom to share our fraud detection so other institutions can use it. Separate from these efforts, strong legal regulations are needed for voice AI models without safety measures."

—Why did you choose the name ElevenLabs?

"At first, we considered traditional names that emphasized voice technology, such as VoiceLab or AudioLab, but we felt those names fell short of capturing our vision to 'completely redefine how people and technology are connected.' My co-founder Piotr and I like math, and the number 11 is mathematically interesting and often appears in pop culture. Apollo 11 was the first to land on the moon, and in everyday expressions, 'turn it up to 11' and '11 out of 10' convey aiming for the best.

On a slightly different note, Klarna, a Swedish Fintech company on whose board I serve, was recently listed on the New York Stock Exchange. At the listing ceremony, they handed out commemorative coins, and the coin said 'Wall Street 11Street.' Seeing that, I thought, "Someday ElevenLabs could do an initial public offering (IPO)." We set a goal to list within five years, but if ElevenLabs maintains its current growth, we expect it could be possible within three years."

—What are your plans for the Korean market going forward?

"ElevenLabs first introduced its voice generation model to an overseas audience at the Interspeech conference in Korea three years ago. Because I came to Korea in our early startup days, it feels like coming home. We will soon open an office in Korea, form a dedicated Korea team, and expand partnerships across various fields in collaboration with leading corporations such as major investors Naver and LG Uplus. We will support Korea in becoming the 'Asian voice AI hub.'"

※ This article has been translated by AI. Share your feedback here.