Revolutionising Disease Classification and Identifying Hidden Disease Patterns

Revolutionising Disease Classification and Identifying Hidden Disease Patterns
9th July 2024 Moriah Aharon

Researchers have developed a machine learning approach to identify potential subtypes in diseases, significantly enhancing disease classification and treatment strategies. The model, which achieved an 89.4% ROC AUC, uncovered 515 previously unannotated disease subtypes, demonstrating the potential for more precise and personalised medical treatments.

Researchers from the Hebrew University of Jerusalem have developed a machine learning approach to identify potential subtypes in diseases, significantly enhancing the field of disease classification and treatment strategies. The study, led by PhD student Dan Ofer and Professor Michal Linial from the Department of Biological Chemistry at The Life Science Institute at Hebrew University, marks a significant advancement in the use of artificial intelligence in medical research.

Professor Michal Linial, photo by Nati Shohat/Flash90

Distinguishing diseases into distinct subtypes is pivotal for accurate study and effective treatment strategies. The Open Targets Platform integrates biomedical, genetic, and biochemical datasets to support disease ontologies, classifications, and potential gene targets. However, many disease annotations remain incomplete, often necessitating extensive expert medical input. This challenge is especially significant for rare and orphan diseases, where resources are limited.

The research introduces a novel machine learning approach to identify diseases with potential subtypes. Utilising the extensive database of approximately 23,000 diseases documented in the Open Targets Platform, they derived new features to predict diseases with subtypes using direct evidence. Machine learning models were then applied to analyse feature importance and evaluate predictive performance, uncovering both known and novel disease subtypes.

The model achieved an impressive 89.4% ROC Area Under the Receiver Operating Characteristic Curve in identifying known disease subtypes. The integration of pre-trained deep-learning language models further enhanced the model’s performance. Notably, the research identified 515 disease candidates predicted to possess previously unannotated subtypes, paving the way for new insights into disease classification.

“This project demonstrates the incredible potential of machine learning in expanding our understanding of complex diseases,” said Dan Ofer. “By leveraging advanced models, we can uncover patterns and subtypes that were previously hidden, ultimately contributing to more precise and personalised treatments.”

This innovative methodology enables a robust and scalable approach for improving knowledge-based annotations and provides a comprehensive assessment of disease ontology tiers. “We are excited about the potential of our machine learning approach to revolutionise disease classification,” said Prof Michal Linial. “Our findings can significantly contribute to personalised medicine, offering new avenues for therapeutic development.”

The research paper titled “Automated annotation of disease subtypes” is now available in Journal of Biomedical Informatics and can be accessed at https://doi.org/10.1016/j.jbi.2024.104650.

Researchers:
Dan Ofer, Michal Linial

Institution:
Department of Biological Chemistry, The Life Science Institute, The Hebrew University of Jerusalem, Israel


The Hebrew University of Jerusalem is Israel’s premier academic and research institution. Serving over 23,000 students from 80 countries, the University produces nearly 40% of Israel’s civilian scientific research and has received over 11,000 patents. Faculty and alumni of the Hebrew University have won eight Nobel Prizes and a Fields Medal. For more information about the Hebrew University, please visit http://new.huji.ac.il/en.