Diseases such as diabetes, cancer, and asthma do not arise simply from abnormalities in a single gene. They result from the complex interplay of multiple genes. Determining how genes work in combination to cause diseases has long remained a challenge for scientists due to the vast number of possible combinations.
A research team from Northwestern University in the United States has developed a new artificial intelligence (AI)-based analysis tool that utilizes gene expression data to identify gene combinations that cause complex diseases. The research findings were published in the Proceedings of the National Academy of Sciences (PNAS) on the 10th.
Previous studies have mostly focused on identifying individual genes associated with specific diseases. In contrast, the Northwestern University research team concentrated on the combinations of genes that contribute to diseases.
Professor Adilson Motor from Northwestern University noted, "Diseases like cancer can be likened to airplane crashes. Just as airplane crashes typically occur after multiple failures, diseases are also determined by combinations of multiple genes, making it important to examine the gene network."
The research team developed an AI model called "TWAVE". This model analyzes differences between healthy and disease states based on limited gene expression data and can predict which genes contribute to diseases.
A gene consists of a sequence of four types of nucleotides. These nucleotides synthesize proteins in this order, overseeing biological phenomena. Decoding a gene involves verifying this nucleotide sequence. The AI developed this time differs in that it not only analyzes gene sequences but also examines how actively genes are functioning by studying "gene expression."
As a result of testing AI across various diseases, the team discovered disease-causing genes that had not been found through traditional methods, including genes not previously identified. They also confirmed that even the same disease can involve different combinations of genes in different individuals. Utilizing this could enable "precision medicine" tailored to individual genetic characteristics.
Particularly, gene expression data presents fewer concerns regarding personal data protection compared to genetic sequences. While genetic sequences are unique personal information, gene expression can vary according to environmental changes and lifestyle. At the same time, using gene expression data allows for the advantage of indirectly examining the influence of environmental factors.
Professor Motor added, "The same disease can manifest similarly in two individuals, but due to genetic, environmental, and lifestyle differences, different genes may be involved for each person. This information can suggest directions for personalized treatments."
References
PNAS (2025), DOI: https://doi.org/10.1073/pnas.2415071122