AI model for detecting harmful expressions in LLM training data. /Courtesy of TTA

The Korea Information & Communication Technology Association (TTA) announced on the 3rd that it has developed an AI model capable of detecting harmful expressions in large language model (LLM) training data. This model was developed as part of the '2024 quality verification project for large-scale AI training data' by the National Intelligence Society Promotion Agency (NIA).

The harmful expression detection AI model (hereafter referred to as the model) first determines whether a sentence in the corpus contains harmful expressions, and if so, classifies the harmful expression into categories to detect the corpus's harmfulness. The harmful categories were established based on the definition of hate speech from the National Human Rights Commission's "Hate Speech Counter Guide" and are divided into a total of 11 types across three categories. The model analyzes harmfulness while considering the context of the text, allowing it to detect harmful expressions that do not contain profanity.

The constructed harmful expression detection AI model and training data have been uploaded to Hugging Face, an open-source library and AI model distribution platform. By utilizing the Hugging Face API, anyone can easily analyze the harmfulness of corpus text through the model and refine harmful expressions.

The model allows users to select from the KcELECTRA Fine-Tuning version and the KoBERT Fine-Tuning version considered during the selection process, which are both uploaded step by step. Additionally, users can view detailed information and test results through model and data cards, enabling them to choose and utilize the model according to their purposes.

It has not been easy to refine the various harmful expressions included in the training corpus data, which have raised issues of reliability and safety in generative AI services. The expectation is that the widespread use of this newly released open-source harmful expression detection AI model will establish a safer and more reliable AI usage foundation.

Son Seung-hyeon, president of TTA, emphasized, "As generative AI technology is being actively applied in various fields, the demand for LLM training text data is skyrocketing, and accordingly, there is increasing social interest in the ethical aspects of LLM technology. In particular, since the model generates text as it has learned, the process of refining harmful expressions included in the training text is an absolute necessity."


※ This article has been translated by AI. Share your feedback here.