KAIST, MIT and Microsoft boost AI vision with training-free upsampling tech

A Korean research team, together with researchers from MIT and Microsoft, develops a technology that improves AI visual recognition even in limited GPU memory environments./Courtesy of KAIST

Smartphone facial recognition, Autonomous Driving cars' surrounding perception, and Humanoid Robot object identification all use a common technology: "Computer Vision." Computer Vision is a technology that helps artificial intelligence (AI) understand its surroundings by looking at images and videos.

A research team led by Professor Kim Chang-ik of the Korea Advanced Institute of Science and Technology's School of Electrical Engineering said on the 17th that, together with researchers at the Massachusetts Institute of Technology (MIT) and Microsoft (MS), it developed "Upsample Anything," a technology that can improve AI's visual recognition performance even in limited graphics processing unit (GPU) memory environments.

Recently, AI systems have compressed input videos into low-resolution feature information to increase computation speed and reduce memory usage. Feature information refers to the key clues AI extracts from an image, such as the shape, boundary, and location of an object.

But in this process, important information such as small objects, thin structures, and fine defects can be lost. Conversely, if all videos are processed in high resolution from the start, GPU memory and computing resources are heavily consumed, making real-time processing difficult. In environments with constraints on device size and power consumption, such as smartphones and robots, this has been considered a particular limitation.

The researchers solved this by restoring low-resolution compressed feature information back to high resolution. The method revives visual information close to the original image by using boundary and structural information contained in the input image.

A key feature of this technology is that it requires no additional training. In many cases, existing methods needed separate retraining or complex optimization to be applied to new environments or data. In contrast, Upsample Anything is designed to find a restoration method from just a single input image and be applied immediately to diverse situations.

According to the researchers, with images sized 224×224, which are widely used in AI research, the technology restored visual information close to the original with about 0.4 seconds of computation. It also improved GPU memory efficiency by up to 16 times by compressing and using only the necessary information instead of storing everything in high resolution.

Professor Kim Chang-ik said, "This technology is an algorithm that can increase AI's visual precision with few resources," adding, "It is expected to contribute to the practical use of On-device AI that runs AI within devices such as Humanoid Robots and smartphones."

The research was accepted to CVPR 2026, a conference in AI and Computer Vision, and received the "CVPR Compute Gold Star" in recognition of its efficient use of computing resources. It was also selected as a "Transparency Champion," which evaluates transparency and reproducibility in research.

References

arXiv (2025), DOI: https://doi.org/10.48550/arXiv.2511.16301

※ This article has been translated by AI. Share your feedback here.