An artist's rendering shows an asteroid striking Earth 66 million years ago./Courtesy of Florida Atlantic University

A research team led by Oh Tae-hyun, a professor in the School of Computing at KAIST, said on the 26th that it developed an AI technology that generates sounds reflecting the physical situation in video, called "PAVAS (Physics-Aware Video-to-Audio Synthesis)," in collaboration with a joint team from Pohang University of Science and Technology POSTECH and Sony Artificial Intelligence (AI).

When a massive dinosaur walks onto the screen in a movie, audiences naturally imagine heavy footsteps and low-frequency rumbles that shake the ground. People anticipate sounds by considering not only the shapes of objects on the screen but also their size, weight, and speed of movement.

The technology is designed for AI to infer physical information such as an object's mass and speed that are not directly shown in the video. The research team enabled the AI to estimate such information by analyzing the surrounding environment, object movements, and collision situations, and to reflect it in the sound generation process.

Testing found that PAVAS produced sounds similar to real environments in scenes where physical interactions such as collisions or impacts occur. In particular, when an object's mass and speed changed, the loudness and timbre of the sound changed as well, improving realism over existing methods.

Professor Oh Tae-hyun said, "This study is meaningful in that it is designed for AI to understand physical quantities and causality," and added, "It could be expanded into next-generation Multimodal AI that handles multiple inputs, including text, video, and audio."

The results were accepted as an oral presentation paper at the Computer Vision and Pattern Recognition Conference (CVPR) 2026, a Computer Vision academic conference, and are scheduled to be presented on June 6.

References

arXiv (2025), DOI: https://arxiv.org/abs/2512.08282

※ This article has been translated by AI. Share your feedback here.