Yaoqi (姚骐), the visual effects (VFX) director responsible for Netflix's original series 'The Three-Body Problem', said on the afternoon of the 21st (local time) at Baidu's '2025 AI Conference' held at Baidu Science and Technology Park in the northwestern Haidian District of Beijing.

While sharing his experience of making a short film with AI, he noted, "Live-action filming has many constraints, and there are always cost and safety issues. Finding locations is also difficult, and the appearance fee of a famous actor can reach 1 million yuan per day, not to mention the substantial costs of the crew, equipment, and accommodation," and added, "The production cycle of a film is also too long, taking anywhere from 1 to 2 years, or as long as 5 to 6 years. In Hollywood, post-production work can cost hundreds of thousands to millions of dollars for a single scene."

He emphasized, "However, the recent short film we made using Baidu's AI took only a week to complete and cost just 300 yuan (about 6,000 won)."

바이두 뮤즈스티머의 영상 생성 샘플. 왼쪽의 사진을 제공한 뒤 대략적인 배경 설명과 대사 등 짧은 프롬프트 입력만으로 자연스러운 영상을 만들 수 있다. /바이두 제공

Baidu, China's largest search engine and AI corporation, unveiled the multimodal video generation model 'MuseSteamer (百度蒸汽机·MuseSteamer)' at the event. MuseSteamer is the world's first Chinese voice and video integrated I2V generation model that can produce videos with just a single image and a simple prompt.

MuseSteamer is characterized by its ability to generate multiple voices and videos simultaneously, unlike previous methods that had a single character exchanging lines amidst static backgrounds. It can naturally overlay realistic ambient sounds with dialogue from various characters. It also showcases synchronization in Chinese lip movements and camera work from various angles.

Looking at the recent AI videos trending on social media (SNS), it is often the case that only the dialogue of characters follows monotonously without natural background sounds like bug chirps or urban noise. In contrast, MuseSteamer creates ambient sounds that fit the background of the video, blending them seamlessly with the dialogue, minimizing the feeling of abrupt scene cuts.

Liu Lin, the head of Baidu Commercial R&D, reveals the video generation AI 'MuseStreamer' on the 21st. /Beijing = Lee Eun-young, correspondent.

During the presentation, Liu Lin (刘林), head of Baidu's research and development (R&D), said, "With precise synchronization of video and sound, we can express the acting, emotions, voice, and expressions of characters in a highly three-dimensional way," and explained, "MuseSteamer constructs the identities, emotions, and logic of interactions among various characters as a larger framework, based on which it ensures the consistency and realism of the story."

He continued, "Thirdly, we have introduced ultra-realistic sound quality, so the voice is not limited by gender and age but is tuned to match the atmosphere and emotions of the scene, providing a natural and harmonious audiovisual experience," adding, "Finally, through Chinese-based optimization, we match the rhythm and context of pronunciation, lip shapes, expressions, and gestures to enhance performance in the Chinese environment."

According to him, MuseSteamer seeks to fundamentally change the cost structure of video production. Taking films as an example, it aims to replace actor grants, location and equipment costs, dubbing, and special effects work in the latter stages with AI. Yao, who participated in VFX for works such as the Netflix series 'The Three-Body Problem' and Hollywood movies 'Matrix 3' and 'Transformers', unveiled the sci-fi short film 'Return (归途)' created with MuseSteamer. This work consists of over 40 scenes produced with more than 120 AI video clips, with a production cost of only 330.6 yuan (about 6,000 won).

야오치 감독이 바이두 '뮤즈스티머'로 제작한 만든 SF 단편영화 '귀환'. /바이두 제공

In terms of the characters, while expressions and eye movements revealed the presence of AI, the seamless expression acting and voice tone were impressive. The various ambient sounds continuing without static and the diverse camera work were familiar, reminiscent of many films, and the dinosaur characters and various background images appeared naturally as if seen in movies or realistic graphic games. While it may be difficult to create a full-length movie entirely with AI, there seems to be sufficient potential for it to serve as an auxiliary production tool in the computer graphic (CG) field.

Liu remarked, "AI has liberated our hands, allowing creators to focus solely on ideas and creativity," calling it "an innovation that truly opens up broader and further paths for filmmaking."

Users can experience it by entering '百度蒸汽机 (Baidu MuseSteamer)' in the Baidu search bar or accessing the '绘想 (Huixiang)' platform. Corporate clients can utilize high-performance services through the Qianfan (千帆) platform. Prices have been set at about 70% of the industry average through a tiered membership system. Based on 720p resolution, a 5-second silent video generation costs 1 yuan (industry average 2 yuan), while a 5-second sound video with audio and dialogue costs 2.5 yuan (industry average 3.5 yuan). The first five videos are offered for free, and members receive credits to create 15 free 5-second videos each month.

※ This article has been translated by AI. Share your feedback here.