Graphic = Son Min-kyun

Before a newborn baby learns letters, they understand their surroundings through vision and hearing. I wanted to create an artificial intelligence (AI) that understands the world visually, rather than linguistically.

TwelveLabs is an AI startup founded in 2021 by three young individuals captivated by multimodal AI. Multimodal AI refers to AI models that understand various information based on photos, drawings, and videos, unlike text-centric AI models.

Lee Jae-sung, 31, majored in computer engineering at the University of California, Berkeley, completed internships at Samsung Electronics and Amazon, and volunteered for the Ministry of National Defense's Cyber Operations Command in 2019 for his defense duty. During his military service, he met Kim Sung-joon, the co-founder and Chief Development Officer of TwelveLabs, and Lee Seung-jun, the Chief Technology Officer.

At that time, language-based models dominated the AI industry, but the three young founders believed that AI understanding visual data based on human cognitive structure would change the future. While serving in the military, they established a corporation with their combined monthly salary of 2 million won and later caught the attention of the American accelerator Techstars, which led to serious technical development.

In the early days, TwelveLabs had a total of 12 members, and the name TwelveLabs originated from this. Currently, 40 research and development personnel are working in Korea, and 40 business marketing personnel are based in the United States.

TwelveLabs' flagship products are the video search model 'Marengo' and the video summarization and Q&A model 'Pegasus'. Marengo is a model that can quickly search for desired scenes from hours of video data using text or images, and it is utilized in media, sports, and public institutions. Pegasus is a model that analyzes video content to summarize or answer specific questions, applicable in industries like news, advertising, and healthcare.

Recently, TwelveLabs gained attention by becoming the first corporation in Korea to launch its model on Amazon Web Services' (AWS) generative AI platform 'Bedrock'. Lee Jae-sung explained during a meeting with ChosunBiz at their office in Itaewon, Yongsan District, on the 17th of last month that this agreement allows hundreds of thousands of companies worldwide to use TwelveLabs' models directly. For example, the Canadian sports entertainment company 'Maple Leaf Sports' is implementing Marengo to automate the generation of game highlight videos.

TwelveLabs has secured a total investment of 150 billion won from corporations such as NVIDIA, Intel, and SK Telecom. Lee stated, "Large clients in North America are utilizing TwelveLabs, and we plan to rapidly target markets in Europe and Asia based on Bedrock." He will be on stage at the 'AWS Summit Seoul 2025' held at COEX in Gangnam, Seoul, from the 14th.

Lee Jae-sung, CEO of Twelve Labs, is interviewing with ChosunBiz at the office in Itaewon-dong, Yongsan-gu, Seoul on the 17th of last month./Courtesy of Twelve Labs

-The company is located in Itaewon, Yongsan. Is there a special reason for this?

"I thought it would be difficult to focus on the core business if we were located in an area crowded with startups. I first met my co-founders in Yongsan. We met during our military service at the Ministry of National Defense's Cyber Operations Command, and we have fond memories of deciding to work together after being discharged in succession."

-The initial funding was 2 million won. What could you do with that money?

"The three co-founders had about 2 million won. However, if you are determined, you can do a lot with 2 million won. Since we didn't have an office, we worked in a friend's office. We also worked in cafes and sought out startup centers supported by the government. We founded the company in March 2021, right in the midst of the COVID-19 pandemic. When working with partners, we conducted video conferences in the early mornings according to U.S. time and worked actively during Korean hours. After about a year of this, we gradually expanded the business after receiving investments."

-The video-based AI model seems unfamiliar. How can it be used?

"It's actively used in the media sector. For instance, to create a 2-hour movie, Disney shoots thousands of hours of video. People have to manually search through materials to edit the film. The TwelveLabs model can read tens of thousands of hours of video and find the desired scenes through searching alone.

In the public sector, it can be used for CCTV video detection, and in the mobility sector, for understanding video data in autonomous driving. The day will come when it will also be applied in the healthcare sector. For example, our technology could be installed in a robot to check on elderly parents. A model that has acquired information on abnormal reactions could detect video in real time, assess the situation, and relay it to hospitals or family members.

-TwelveLabs has gained attention for becoming the first Korean corporation to have its AI model deployed on AWS Bedrock.

"We've had ties with AWS since the early startup days, so it's already been four years. When we had no capital at all, the computing resources supported by AWS were extremely helpful. Now, building a single model can cost billions of won, but in the early startup phase, spending 100 million won to train a model was quite unheard of. AWS quickly recognized this trend and supported us."

Regarding our product being added to Bedrock, we have been discussing it since the end of last year. It is said that AWS customers often couldn't utilize vast amounts of video data stored in the cloud. I understand that there were numerous requests from client companies asking AWS to include TwelveLabs products. As our model is integrated into Bedrock, we expect that aspects of data security and client trust in AWS will positively impact our company's global recognition.

-Companies like OpenAI and Google are also creating multimodal AI. What differentiates TwelveLabs?

"We are not just a simple 'model' company. We develop everything from video processing to video indexing technology. We train the model not with language, but with video itself, so the AI recognizes the flow of information well. Since the foundation is video information, it can learn images, audio, and even music. We can work faster and at a lower expense compared to competitor models. Our newest product, Pegasus-1.2, demonstrated a faster response time than competing products like GPT-4o and Google Gemini 1.5 Pro."

-I'm curious about the future TwelveLabs envisions for video AI.

"Until now, we have released products dealing with generated videos. In the future, we want to create products that quickly understand videos generated in real time. The technology is advancing to the point where we can provide video for real-time streaming services. Ultimately, we envision our TwelveLabs model being installed in all equipment with cameras, sending condensed messages in real-time or connecting to necessary tasks."

-Do you have any thoughts on the development of the AI industry in Korea?

"I hope we can abandon the mindset of 'we're late, so let's just catch up.' It's not just about creating language models. I believe we need to predict what the next flow will be and lead the way to get ahead. I hope really good AI models emerge from Korea. The know-how and knowledge generated while global clients utilize Korea-based models should spread in the Korean AI industry. Such knowledge-hungry talents often head to the U.S. In the future, I hope the government listens more to the voices of startups and large corporations contributing to Korean AI sovereignty."

※ This article has been translated by AI. Share your feedback here.