As artificial intelligence (AI) technology advances, attempts to introduce AI into new drug development are rising quickly. In the early stages of research, such as candidate screening, molecular design and toxicity prediction, AI use is already becoming commonplace. Still, cases that implement in the lab the approach where AI decides which candidate to test next and feeds the results back into training remain limited.
Recently, attempts to put such experiment-driven AI at the core of research systems have begun in earnest among global big pharma. A representative case is Eli Lilly and Company.
On the 12th (local time), Lilly said it would build a joint lab with Nvidia and introduce a "lab-in-the-loop" structure in which robots immediately synthesize and test AI-proposed molecules and feed the results back into AI training. The plan is to move away from the existing approach of having AI conduct post-hoc analysis of data generated by researchers, and instead have AI lead the flow of iterative experiments.
By contrast, many say AI remains a supporting tool for most Korean pharmaceutical companies, confined to organizing vast literature or aiding candidate discovery. A research lead at a Korean pharmaceutical company said, "For now, AI is closer to a tool that reduces repetitive, time-consuming tasks."
Experts say that behind this gap are not only differences in technological prowess but also overlapping structural factors such as regulation and the data-use environment.
◇ Korea's research volume has grown, but the stages have stalled
It is hard to say that Korea's AI new drug development capabilities are broadly lagging. An analysis by the Korea Research Institute of Bioscience and Biotechnology (KRIBB) of 33,956 AI drug discovery papers published worldwide over the past 10 years from 2015 to 2024 found that Korea published 1,016 papers in the same period, ranking ninth. In the last three years, Korea published 637 papers, pushing its rank up to sixth. Participation in research itself is expanding rapidly.
Influence is also improving. By the Relative Citation Ratio (RCR), which indicates the qualitative level of papers, Korea ranked seventh with a 10-year average of 2.20 and fifth with a three-year average of 2.35.
However, the picture changes when looking at the "stages" where research concentrates. In U.S. papers, the keyword "preclinical research" appeared 702 times and "clinical research" 780 times. China also maintained high levels, at 615 and 640, respectively. By contrast, in Korea during the same period, the "preclinical research" keyword was virtually absent, and "clinical research" appeared only 79 times, about one-tenth the U.S. and China levels.
Of course, the fact that Korea has fewer AI drug development papers than the U.S. or China also had an effect. Still, while early exploration keywords such as protein analysis, drug–target interaction elucidation and candidate discovery appear at a certain level, the near absence of keywords related to preclinical and clinical stages is hard to explain by paper count differences alone, analysts say.
◇ "If data aren't consolidated, there are no experiments"
So why is Korea's AI drug development stuck at the early exploration stage? The industry points to "data fragmentation" as the reason.
For AI to become a "tool that decides the next experiment," it needs data in which past experiment results are accumulated and consolidated. But in Korea, because such data are not structurally accumulated, AI does not move beyond a "tool that spits out many candidates," the explanation goes.
Most clinical and genomic data available for pharmaceutical research and development in Korea are fragmented. They are scattered across specific projects or research units, only successful outcomes remain, and data on why studies failed do not carry over to the next research. As a result, even if an AI model designs a new candidate molecule, it is difficult to retrain on how those results were validated in actual experiments.
The institutional environment also reinforces this disconnect. The Personal Information Protection Act (PIPA) allows processing of pseudonymized information without the explicit consent of data subjects only for the purposes of compiling statistics, scientific research or preserving records for the public interest. However, information processed through pseudonymization is still classified as "personal information," bringing strict limits on use for other purposes or provision to third parties.
Combining different pseudonymized medical and clinical datasets also must go through designated specialized government institutions and restricted environments such as "Safe Zones."
In the U.S., the principle for using medical information is also patient consent, but there are institutional exceptions for research purposes. With approval from bodies such as an Institutional Review Board (IRB), even protected health information (PHI) with potential identifiability can be combined and analyzed directly inside the institution by research institutions or corporations, provided privacy safeguards and internal controls are in place.
◇ Will "building bio big data" catalyze change?
In this context, the industry also expresses disappointment with the government's ongoing National Integrated Bio Big Data Project (BIKO). Securing data alone has limits, they say, and a usable ecosystem must be built in parallel. Based on public consent, BIKO is a collaboration among the Ministry of Health and Welfare, the Ministry of Science and ICT, the Ministry of Trade, Industry and Energy, and the Korea Disease Control and Prevention Agency to build genomic and clinical data for 772,000 people from 2024 to 2028.
Park Bong-hyun, a senior researcher at the Korea Biotechnology Industry Organization's Bioeconomy Research Center, said, "We need institutional flexibility premised on the continuous use of data," and added, "One solution could be expanding regulatory sandboxes so researchers can combine and analyze data more freely within Safe Zones."
Park added, "In addition, corporations can expand AI use beyond candidate discovery into later stages only if guidelines for Good Machine Learning Practice (GMLP), which cover development, validation and operation of machine learning models used in AI drug development, and standards for data submission and evaluation are established together."