Improving T2D machine learning-based prediction accuracy with SNPs and younger age

Research output: Contribution to journalArticlepeer-review

1 Scopus citations

Abstract

Background: This study aimed to evaluate whether integrating clinical and genomic data improves the performance of machine learning (ML) models for predicting Type 2 Diabetes (T2D) risk. Methods: Six models—Random Forest, Support Vector Machine, Linear Discriminant Analysis, Logistic Regression, Gradient Boosting Machine, and Decision Tree—were trained and tested on a discovery dataset (N=3,546) and validated in the UK Biobank (N=31,620). Model performance was assessed using clinical data alone, combined clinical and genomic data, and in age-specific groups (>55 and ≤55 years). Results: The inclusion of genomic data modestly improved model performance across all algorithms in the discovery dataset. Clinical features such as family history of T2D and hypertension consistently ranked as top features. When SNPs were added, T2D-associated variants, including rs2943641 (IRS1), rs7903146 (TCF7L2), and rs7756992 (CDKAL1), emerged among the most important features, particularly in younger individuals. These findings demonstrate the translational potential of incorporating genomics for early risk identification. In the UK Biobank, all models achieved AUCs exceeding 91 % with combined clinical and genomic data. Performance was notably better among younger individuals (≤55 years), emphasizing the models’ potential for early detection. Integration of a polygenic risk score (PRS) further supported risk prediction, particularly in younger individuals, though incremental gains were modest. Conclusions: While traditional clinical factors remained the strongest predictors of T2D risk, integration of genomic data produced a modest improvement in model performance, especially among younger adults. Validation across independent datasets confirmed the generalizability of these findings, underscoring the value of multi-dimensional risk-prediction models to refine T2D risk assessment.

Original languageBritish English
Pages (from-to)2772-2781
Number of pages10
JournalComputational and Structural Biotechnology Journal
Volume27
DOIs
StatePublished - Jan 2025

Keywords

  • AI
  • Machine Learning
  • Predictive models
  • T2D

Fingerprint

Dive into the research topics of 'Improving T2D machine learning-based prediction accuracy with SNPs and younger age'. Together they form a unique fingerprint.

Cite this