Provided by the Wikimedia Foundation/Courtesy of the Wikimedia Foundation

The free online encyclopedia Wikipedia urged artificial intelligence (AI) corporations to stop unauthorized data collection (scraping).

The Wikimedia Foundation, which operates Wikipedia, on the 10th (local time) asked AI developers to use it "responsibly," including citing Wikipedia as the source of content, and to use its paid product, the Wikimedia Enterprise platform.

Major AI corporations are known to be scraping Wikipedia's content in bulk to train Generative AI models based on large language models (LLMs). For AI models to advance, the quality of training data must be high, and Wikipedia's content is evaluated as vast in volume as well as objective and reliable.

Recently, AI bots have been disguising themselves as if they were people to scrape Wikipedia, the foundation said. According to the foundation, the number of Wikipedia visitors was abnormally high in May–June this year, the result of a surge in AI bots visiting to collect data without permission. The foundation said, "While visits by human users recently fell 8% year over year, visits presumed to be from AI bots increased."

The foundation added that there were also attempts by AI bots to disguise themselves as if they were human to evade "bot detection."

It explained that using its paid product allows large-scale access to content without placing a severe burden on Wikipedia's servers.

It also urged that when AI platforms cite Wikipedia in their answers, they must clearly indicate the source. The foundation emphasized, "For people to trust information shared on the internet, platforms must clearly disclose the sources of information and provide an opportunity to visit those sources."

It added, "If visits to Wikipedia (by human users) decrease, the number of volunteers who improve the quality of content will decline, and individual donors who support them may also decrease."

※ This article has been translated by AI. Share your feedback here.