A research team from Seoul National University Hospital and Harvard Medical School unveiled what it says is the world's first system to test medical artificial intelligence (AI) by building a "virtual hospital" that operates like a real one. Without directly involving patients, the approach allows researchers to preview how AI decisions affect patient conditions and hospital operations.

Kim Seong-eun, research professor at the Specialized Research Institute, Biomedical Research Institute, Seoul National University Hospital. /Courtesy of Seoul National University Hospital

Seoul National University Hospital said on the 14th that research professor Kim Seong-eun of the Specialized Research Institute at Seoul National University Hospital and a joint team at Harvard Medical School published the latest online edition of the international journal Nature Medicine (IF 50), introducing the Clinical Environment Simulator (CES), which dynamically evaluates large language model (LLM)-based medical AI.

Until now, medical AI evaluations mostly looked at how accurately a system could make diagnoses based on past patient data. But in real hospitals, circumstances keep changing. A patient's condition can worsen over time, and resources such as beds, medical staff, and testing equipment are limited. These factors were not adequately reflected in existing evaluation methods.

The Clinical Environment Simulator (CES) developed by the team is a "digital hospital" that mirrors this reality. The system consists of a "patient engine" that recreates changes in patient conditions and a "hospital engine" that manages the status of beds, staff, and equipment. As the two systems run in tandem, the effects of AI decisions on subsequent situations are continuously incorporated.

For example, if the AI orders a test late, a stable chest pain patient could deteriorate into acute myocardial infarction and worsen. Conversely, prioritizing a CT scan for one emergency patient can lengthen test wait times for others. A single AI decision can change not only a particular patient's outcome but also the overall flow of care across the hospital.

The system evaluates AI on two criteria: how well patients are actually treated and how efficiently the hospital is run. Even if it treats a particular patient well, the score drops if it burdens the hospital as a whole.

Clinical Environment Simulator (CES) operating paradigm. /Courtesy of Seoul National University Hospital

The team also tested AI response capabilities by assuming extreme conditions such as network outages or surges in emergency patients. The aim is to verify in advance how AI performs even in crisis situations that can arise in real hospitals.

The biggest significance of this study is that it established an environment to validate medical AI in advance without exposing patients to risk. If this approach spreads, it could become a required "testing stage" before medical AI is introduced in hospitals.

Research professor Kim Seong-eun said, "A virtual hospital cannot perfectly predict the body's complex physiological responses," but added, "This study will be the most valuable next step to verify that medical AI moves beyond tools that solve piecemeal problems to become fully integrated into dynamic health care systems and provide real help."

References

Nature Medicine (2026), https://doi.org/10.1038/s41591-026-04252-6

※ This article has been translated by AI. Share your feedback here.