ESTsoft said on the 17th that its research on artificial intelligence (AI) automatic dubbing technology was accepted by the world-renowned Natural Language Processing (NLP) conference "EMNLP 2025" and that it presented the results in Suzhou, China.
The paper by ESTsoft's researchers proposes a framework that uses a large language model (LLM) to implement multilingual automatic dubbing that matches the original video's utterance timing. It focuses on solving the unnatural sync issues that occur in conventional automatic dubbing when the lengths of the translated and original audio differ.
The framework consists of STT (Speech-to-Text), NMT (Neural Machine Translation), and TTS (Text-to-Speech) modules. In the NMT stage, the researchers introduced "duration-based translation (DT)" and "pause integration" to reflect the original audio's duration and silence information in the translation process. This enables the generation of dubbed videos that naturally maintain speaking speed and rhythm.
In experiments, the proposed method showed a 24% improvement in video-audio sync accuracy and a 12% improvement in multilingual listening satisfaction compared with existing commercial AI dubbing systems. Peer review of the paper also assessed it as a meaningful achievement in terms of the potential to solve the core issue of time synchronization in automatic dubbing and its multilingual scalability.
This study is a proof-of-concept conducted during the advancement of ESTsoft's Perso AI Dubbing service. The researchers said, "We were able to engage in technical discussions with overseas researchers," and noted, "It was meaningful to have the completeness of the technology recognized on the global stage."
Jeong Sang-won, CEO of ESTsoft, said, "Perso AI has advanced automatic dubbing technology by improving issues identified in actual service," and added, "We will continue to strengthen our competitiveness in the global AI dubbing market based on our research achievements."