A domestic research team has proposed a new solution that breaks through the structural limits of artificial intelligence (AI) collaborative learning.
A research team led by Professor Park Chan-young in the Department of Industrial and Systems Engineering at the Korea Advanced Institute of Science and Technology (KAIST) said on the 15th that it developed a new training method that solves the chronic performance degradation problem in federated learning and greatly improves the generalization performance of AI models.
"Federated learning" was devised to solve the difficulty of aggregating personal data in one place, such as patient medical records or financial data. Federated learning is a method that allows multiple institutions to jointly train AI without directly exchanging data.
However, in the process of each institution optimizing the jointly trained AI to its own environment, there was a limitation in which the AI overfits to a specific institution's data and becomes vulnerable to new data. For example, after several banks build a "joint loan screening AI" together, if a specific bank conducts training centered on data from large corporate customers, the bank's AI shows strength in corporate screening but underperforms in screening individuals or startup customers.
To solve this, the research team introduced the "synthetic data" approach. They extracted only the core and representative features from each institution's data to generate virtual data that does not include personal information and applied it to the AI training process. This allows each institution's AI to strengthen its expertise tailored to its own data without sharing personal information, while not losing the broad perspective (generalization performance) gained through joint training.
The method proved particularly effective in fields where data security is important, such as health care and finance, and also delivered stable performance in environments like social media or e-commerce, where new users and products are continuously added.
Professor Park Chan-young said, "This study opens a new path that protects data privacy while ensuring both specialization and versatility for each institution's AI," and added, "It will be a great help in fields where data collaboration is essential but security is important, such as medical AI and financial fraud detection AI."
This study was selected as an oral presentation, which is reserved only for the top 1.8% of outstanding papers, at the International Conference on Learning Representations (ICLR) 2025, the most prestigious conference in the field of artificial intelligence, held in Singapore on Apr. 4th.
References
arXiv (2025), DOI: https://doi.org/10.48550/arXiv.2503.03995