SK Telecom announced on the 29th that it has introduced open-source document interpretation technology for training visual-language models (VLM) and large language models (LLM) based on the artificial intelligence (AI) model adot.
VLM refers to an artificial intelligence model that processes visual and linguistic information in an integrated manner.
The newly released mid-sized model 'A.X 4.0 VL Light' is a VLM trained on a large-scale multimodal Korean dataset, capable of understanding data required in industrial settings, such as tables, graphs, and manufacturing drawings.
According to the company, this model achieved an average score of 79.4 on the Korean visual benchmark, demonstrating superior performance compared to the Chinese Qwant 2.5-VL32B.
SK Telecom also developed the 'A.X Encoder' to apply to the entire data processing process required for the A.X model, achieving up to three times the inference speed and twice the learning speed compared to existing models.
The document interpretation technology known as 'Encoder' helps transform input sentences into context within natural language processing technology and performs various natural language processing tasks based on this context, understanding the meaning and context through the interrelationships of all words in the sentence.
This model is based on 149 million parameters and achieved an average score of 85.47 on natural language understanding performance metrics, demonstrating state-of-the-art (SOTA) performance.
SK Telecom announced a total of six models, additionally releasing two technologies this time aimed at broader industrial applications of LLM.
Kim Tae-yoon, who is in charge of foundation models at SK Telecom, said, "Securing independent technological capabilities is key to sovereign AI, so we will enhance our own capabilities and accelerate collaboration with consortium corporations to secure global-level AI competitiveness."