Twelve Labs launches MCP server to give AI agents video understanding

Twelve Labs announces on the 30th that the AI development corporations released the Model Context Protocol (MCP) server, which provides video intelligence to AI agents. /Courtesy of Twelve Labs

Twelve Labs said on the 30th that it has released a "Model Context Protocol (MCP) server" that provides video intelligence to AI agents.

Twelve Labs explained that with this launch, AI assistants can, for the first time, understand videos, find related scenes, and generate summaries.

MCP is an open standard protocol developed by Anthropic.

The Twelve Labs MCP server consolidates the company's video understanding models with key AI tools used by developers, including Claude Desktop, Cursor, and Goose.

Based on Twelve Labs' in-house multimodal video understanding model "Marengo" and video-language generation model "Pegasus," the server supports a range of features, including ▲ natural-language video search ▲ automatic summarization and Q&A for video content ▲ building multi-step video workflows ▲ real-time video exploration support.

This enables the development of a wide variety of multimodal applications, from smart assistants that understand recorded meeting videos to creative content-generation AI that leverages the context of diverse videos.

Lee Jae-sung, CEO of Twelve Labs, said, "With this MCP release, video has become a core function in every AI workflow," and noted, "What Twelve Labs has pursued since its founding days is "integrated multimodality."

He added, "Our conviction was not a multi-model approach that uses separate models for text, images, and audio, but a true multimodal implementation that comprehensively understands all elements of video through a single interface. This MCP server is precisely the result of that philosophy."

※ This article has been translated by AI. Share your feedback here.