Statistics Korea has been reorganized as the National Data Agency. This change, which elevates its status from an agency under the Ministry of Economy and Finance to an independent body under the prime minister, must not end as a mere signboard swap. As data is emerging as a core asset that drives decision-making across administration, industry, and society, this could be a turning point for leaping to the status of a data powerhouse through innovation in national data governance. What path should the National Data Agency take to make that happen? We examine it through "the future of national statistics." [Editor's note]
The National Data Agency is pushing to fully introduce artificial intelligence (AI) into its statistical work. The idea goes beyond simply increasing the amount of data, envisioning AI that directly reads statistical tables and links different materials so they can be used immediately in policy.
◇ "AI that can't read statistical tables"… Metadata is the key
According to the National Data Agency on the 6th, AI currently reads texts such as blogs and articles with ease but cannot understand the tables published on the Korean Statistical Information Service (KOSIS). When only numbers are lined up, there is no signboard that explains "what this means."
That signboard is precisely "metadata." For example, to properly answer questions like "the trend of youth employment rates over the past 40 years" or "the year with the lowest suicide rate," AI needs descriptions attached to the data that go beyond simple numbers.
Even for unemployment statistics, it is not enough to have a simple "what percent" figure; background information must be recorded, such as the formula that the figure represents "the share of people unable to work among those able to work," the applicable age range and survey time point, and the statistical source. Metadata plays the role of a "user manual" that adds meaning and context to numbers in this way.
An official at Statistics Korea said, "Today's AI cannot directly access official databases (DB) and answers based on secondary sources like articles or blogs," noting, "Even if it looks plausible on the surface, the 'hallucination' problem of presenting data that differ from reality is fatal." The official emphasized, "We need a metadata framework that tells AI which data are trustworthy and where to retrieve them."
Starting this year, the National Data Agency will build AI-friendly metadata focused on approved statistics. For example, it will attach a manual that a machine can understand about which survey and which formula the concept of "unemployment rate" is calculated from. The plan is to expand this through 2028 and then extend it to pan-government public data starting in 2029.
◇ Data centers to become "policy labs" where AI assists analysis
The places where AI adoption will be felt first are the 16 Statistical Data Centers (SDC) nationwide. Until now, researchers had to find one by one which materials to link, but once AI is fully introduced, it will recommend combinations such as "If you look at this data together with that data, you can see the policy effect."
For example, if a Sejong City official wants to check the effectiveness of support policies for corporations with women representatives, AI will guide that the corporate statistical register and the enterprise roster can be linked to analyze changes in sales. It will then automatically generate analysis code and provide the results visualized in tables and graphs. The structure is designed to help even those who cannot use statistical programs easily grasp policy effects.
The Statistical Data Centers operate on a closed network to protect personal information. Because external internet access is blocked, services like OpenAI's ChatGPT cannot be used as is. Statistics Korea is reviewing a method to introduce a standalone generative AI that supports coding and visualization even in a closed-network environment.
AI adoption is not a challenge unique to Korea. The United Nations (UN) Statistical Commission last year recommended that countries strengthen digital transformation, metadata standardization, and the FAIR principles (findability, accessibility, interoperability, reusability). The Organization for Economic Cooperation and Development (OECD) also said "AI is an opportunity for statistical innovation," emphasizing the need to strengthen data structuring and standardization, and metadata.
In particular, the OECD has been pursuing the Generative AI for Official Statistics Project since last year, strategically studying the impact that generative AI technologies such as large language models (LLM) will have on official statistics.
Eurostat, the statistical office of the European Union, has introduced the concept map of "ontology" to provide a standardized search environment that allows machines to understand relationships among statistics. In short, ontology is a "statistics map."
For example, it shows the path of how the concept of the unemployment rate connects to variables and formulas, and metadata explains its definition, units, and sources. The two elements must be combined for AI to find statistics and interpret them correctly at the same time. The United States and the United Kingdom are also putting in place metadata systems with AI use in mind to improve researcher access.
◇ "Securing reliability is key"… AI grand transition task force launched
The National Data Agency is operating an "AI grand transition strategy task force" and is preparing a mid- to long-term roadmap to introduce AI across the entire process from compiling statistics to dissemination and use. Around 1 billion won will be spent this year on establishing an information strategy plan (ISP) and producing research-use reproducible data, with the main budget set to be concentrated from next year on.
The task force is divided into a strategy planning subcommittee and an execution tasks subcommittee. The strategy planning subcommittee draws the big picture, and the execution tasks subcommittee takes on detailed tasks needed immediately in the field. Actual tasks include building AI-friendly metadata, improving the efficiency of statistical production, introducing a closed generative AI lab, establishing a safe framework for using statistics, cultivating experts, and setting ethics and security standards.
An official at the National Data Agency explained, "We plan to present an overall roadmap around early next year," adding, "AI is already producing results in the coding process of statistical production, and it will expand across services going forward."
Ultimately, for AI to take root as a "policy brain," a system is needed that goes beyond simple search and combines the optimal data at the policy design stage. To address complex issues such as youth employment, suicide prevention, and regional industry policy, various materials in the economy, society, and health fields must undergo consolidation. AI will handle recommendations and interpretation, and people will design policy based on the results.
An official at the National Data Agency emphasized, "As AI use is an irreversible trend, building a trustworthy data foundation is more important than anything else."