The most important factor in creating a Korean artificial intelligence (AI) is data. Through technology that refines high-quality AI training data, we have created "Mi:dm 2.0," which injects 'Korean values' such as emotions, culture, and history.
Shin Dong-hoon, head of KT's generative AI lab (chief AI officer, executive director), noted this during an online media briefing on the 3rd. KT emphasized the importance of 'Korean data' in the government's recent push for 'Sovereign AI' development.
Sovereign AI refers to the creation and operation of independent AI using the nation or corporations' own infrastructure and data. The administration of Lee Jae-myung is planning the 'Everyone's AI Project', which aims to provide all citizens with a proprietary AI service tailored for high Korean language usability and aligned with Korean culture, system, and characteristics.
KT has publicly released Mi:dm 2.0 as open-source, having advanced and optimized the Mi:dm 1.0 version launched in October 2023 using its own technology. This technology, developed over approximately two years, has been made available for commercial use by corporations, individuals, and public entities without restrictions. The government is also advancing policies with a focus on the 'open-source release' of independently developed AI.
The Ministry of Science and ICT is promoting a project to select up to five elite teams and compress this through a phased evaluation to ultimately develop a 'proprietary AI foundation model.' By releasing Mi:dm 2.0, KT has thrown its hat in the ring for this project. The executive director stated, "We are preparing to participate in the government project," adding "(The government development direction) aligns with our AI philosophy. The proprietary AI model must encapsulate Korean values and culture, and the data we have built through collaboration with the data alliance over the past year will be a significant advantage in creating the 'proprietary AI model.'"
◇ "Mi:dm 2.0 is the model representing Sovereign AI"
KT introduced Mi:dm 2.0 as 'Korean AI,' as it is a proprietary AI model created from pre-training and has secured all copyrights for high-quality Korean language training data. The executive director explained, "We systemically classified the secured data into a total of 200 subcategories based on criteria such as language, form, and content," and emphasized, "We have a data management system that is suitable not only for functioning as a general AI model but also for specialization according to specific circumstances."
KT believes that the 'Sovereign AI' emphasized by the government ultimately relies on Korean data. The executive director stated, "User data must be thoroughly protected by sovereignty, and the ability to choose must be provided depending on the user environment and purpose," adding, "It is being operated safely and responsibly while complying with all regulations." He further remarked, "Mi:dm 2.0 is the model representing Sovereign AI."
Mi:dm 2.0 scored higher than other similarly sized domestic and international models in the 'Ko-Sovereign' benchmark, a Korean language AI capability assessment index jointly developed by KT and Korea University. Furthermore, Mi:dm 2.0 also received an 'excellent performance' evaluation in the 'KMMLU' benchmark, which measures understanding of specialized knowledge related to Korea, and in the Korean language model evaluation index, 'HAERAE.'
KT has also collaborated with domestic fabless semiconductor design company Libelium from the development stage of Mi:dm 2.0 to ensure optimal operation in domestic AI semiconductors. It has also developed and applied a 'tokenizer' that reflects the structure and linguistic characteristics of the Korean language. KT stated, "Through academic cooperation with the Korea University Institute of Ethnic Culture, we have also secured academic credibility for Mi:dm 2.0 as 'Korean AI.'"
The Mi:dm 2.0 model released by KT consists of two types: ▲ Mi:dm 2.0 Base, which has 11.5 billion parameters, and ▲ Mi:dm 2.0 Mini, which has 2.3 billion parameters. Both support Korean and English. KT is the first to release an open-source Korean general-purpose LLM with over 11 billion parameters for commercial use.
As Mi:dm 2.0 is developed to align with 'Korean AI,' KT plans to focus first on targeting the government-targeted business (B2G) market. The executive director commented, "We believe it is primarily suitable for public and financial sectors," and stated, "We plan to gradually expand services into education and legal fields, and the launch of services for general consumers (B2C) is under consideration."