As artificial intelligence (AI) competition intensifies, so-called "data foundries" that meticulously process and supply data for industrial purposes are emerging as a new business area. As the view spreads that the quality of data and the ability to use it, rather than model size, determine AI performance, corporations engaged in related businesses are drawing attention.

Illustration = Chat GPT

According to the startup sector on the 17th, Bound4, a data foundry corporation, saw its orders in the first quarter of this year increase about fivefold from a year earlier. Bound4 is a startup with 12 employees that initially focused on building image and video data used in robots and Autonomous Driving. Recently, it has expanded into processing text-based, task-oriented data and is developing its business in that area.

A data foundry is a concept similar to the foundry model in the semiconductor industry. Just as the semiconductor sector raised efficiency by dividing design and production, in the AI industry it refers to a structure in which specialized firms design, process, and produce data secured by corporations to suit industrial purposes.

In the industry, the growth of data foundry corporations is being assessed as a case showing that the Generative AI race is shifting from the stage of securing data to competition in processing and using data tailored to industrial purposes. As AI expands into actual task execution, analysis is gaining ground that demand is rising for structured data that reflects context rather than simple data.

Change is being detected overseas as well. Scale AI, a U.S. corporation specializing in AI data labeling and construction, is expanding its role in supplying data needed for AI training through data construction and validation and the production of reinforcement learning data based on human feedback. Meta invested about $14.3 billion (about 21 trillion won) in Scale AI last year. At the time, the corporation's valuation was about $29 billion (about 43 trillion won). Nvidia and Amazon also participated in the investment.

Bound4 CEO Hwang In-ho said, "Design and utilization capabilities suited to the purpose are becoming more important than the volume of data," and predicted, "The trend of separating data production and processing into specialized domains will spread."

In Korea, Kairos Lab is also cited as a data foundry corporation. Kairos Lab standardizes and processes experimental data that was not used during research into a form that can be used for AI training and repurposes it. Based on this, it is focusing on developing AI solutions specialized for materials that reduce repetitive trial and error and predict optimal physical properties in materials development for semiconductors and batteries.

Some see the data foundry business not as a new industry group, but as an extension of data engineering that refines and processes raw data for its intended purpose. Even within the same sector, it is difficult to establish a consistent framework because each client demands different data structures and quality standards, and some note that as the share of custom builds by client grows, the work can edge closer to project-based service contracts.

An industry official said, "The trend of dividing data production and processing into specialized domains itself can grow," but added, "The higher the share of customization by client, the harder it is to achieve economies of scale, so the long-term profitability of corporations touting data foundries bears watching."

※ This article has been translated by AI. Share your feedback here.