As artificial intelligence (AI) technology advances, attempts to adopt AI in new drug development are rising quickly. In the early research stages—such as candidate screening, molecule design, and toxicity prediction—AI use is already becoming commonplace. Still, cases that implement an approach in which AI decides which candidate to test next and feeds the results back into training in real-world labs remain limited.

Recently, efforts to place this kind of experiment-driven AI at the center of research systems have gained momentum among global big pharma. A representative case is Eli Lilly and Company.

On the 12th (local time), Lilly said it would build a joint lab with Nvidia and adopt a "lab-in-the-loop" structure in which robots immediately synthesize and test molecules proposed by AI and feed the results back into AI training. The plan is to move beyond the conventional method of having AI analyze data generated by researchers after the fact, and instead have AI lead the flow of iterative experiments.

In contrast, many say AI remains a supporting tool for many domestic pharmaceutical companies, limited to organizing vast literature or assisting with candidate discovery. A research head at a domestic pharmaceutical company said, "For now, AI is closer to a tool that reduces repetitive, time-consuming tasks."

Experts say that behind this gap are not only differences in technological capability but also structural factors such as regulations and the data-use environment.

Jensen Huang, Nvidia CEO, and Dave Ricks, Eli Lilly CEO, shake hands before a talk at the JPMorgan Healthcare Conference in San Francisco, United States, on the 12th (local time)./Courtesy of Nvidia

◇Korea's research volume grew, but its stages stalled

It is hard to say that Korea's AI new drug development technology is broadly lagging. An analysis by the Korea Research Institute of Bioscience and Biotechnology (KRIBB) of 33,956 AI drug development papers published worldwide over the past 10 years from 2015 to 2024 found that Korea published a total of 1,016 papers over the same period, ranking ninth in the world. In the past three years, the number of papers published was 637, pushing the ranking up to sixth. Participation in research itself is expanding rapidly.

Its influence is also improving. By the Relative Citation Ratio (RCR), which indicates the qualitative level of papers, Korea ranked seventh with a 10-year average of 2.20 and fifth with a three-year average of 2.35.

However, the picture changes when looking at the "stages" where research is concentrated. Among U.S. papers, the frequency of the keyword "preclinical research" reached 702 and "clinical research" reached 780. China also remained high at 615 and 640, respectively. In contrast, during the same period in Korea, the "preclinical research" keyword was virtually undetected, and the "clinical research" keyword stood at 79—about one-tenth the levels of the United States and China.

Of course, the fact that the number of Korea's AI drug development papers is smaller than that of the United States or China also had an effect. Still, while keywords related to early discovery stages—such as protein analysis, drug–target interaction identification, and candidate discovery—appear at a certain level, the near absence of keywords related to preclinical and clinical stages is difficult to explain by the difference in paper counts alone.

Comparison of keyword frequencies by new drug development stage in papers from the United States, China, and Korea (2015–2024)./Courtesy of Korea Research Institute of Bioscience and Biotechnology (KRIBB)

◇"If data don't connect, there are no experiments"

So why does Korea's AI drug development remain stuck in the early discovery stage? The industry points to "data fragmentation" as the reason.

For AI to become a "tool that decides the next experiment," data that accumulate and connect previous experiment results are necessary. But in Korea, because those data do not build up structurally, AI cannot rise above being "a tool that selects many candidates."

Most clinical and genomic data available for use in domestic pharmaceutical research and development are fragmented. They are scattered across specific projects or research units, only successful results remain, and data on why failures occurred do not carry over to the next study. As a result, even if an AI model designs new candidate molecules, it is difficult to retrain on how those results were validated in actual experiments.

The institutional environment also reinforces this break. The Personal Information Protection Act (PIPA) allows the processing of pseudonymous information without the explicit consent of data subjects only for statistics, scientific research, and public-interest record preservation. However, information that has undergone pseudonymization is still classified as "personal information," and strict restrictions apply to use beyond the stated purpose or provision to third parties.

The combination of different pseudonymous health care and clinical datasets must also go through restricted environments such as government-designated specialized agencies and "Safe Zones."

In the United States, the principle for using health information is also patient consent, but exceptions are institutionally recognized for research purposes. With approval from an Institutional Review Board (IRB), even protected health information (PHI) with reidentification risk can be combined and analyzed directly within institutions by research institutions or corporations, provided privacy protections and internal controls are in place.

◇Will "building bio big data" catalyze change?

In this context, the industry also expresses disappointment with the government's ongoing National Integrated Bio Big Data Project (BIKO). There are limits to data collection alone, and a usable ecosystem must be built in parallel. Based on public consent, BIKO envisions the Ministry of Health and Welfare, the Ministry of Science and ICT, the Ministry of Trade, Industry and Energy, and the Korea Disease Control and Prevention Agency working together from 2024 to 2028 to build genomic and clinical data for 772,000 people.

Park Bong-hyeon, a senior researcher at the Korea Biotechnology Industry Organization's Bioeconomy Research Center, said, "Institutional flexibility premised on the continuous use of data is needed," and noted, "One solution could be to expand regulatory sandboxes so researchers can more freely combine and analyze data within Safe Zones."

Park added, "In addition, corporations can expand AI use beyond the candidate discovery stage into later stages only if guidelines for Good Machine Learning Practice (GMLP), which cover the development, validation, and operation of Machine Learning models used in AI drug development, and standards for data submission and evaluation are established together."

※ This article has been translated by AI. Share your feedback here.