A Machine Learning Model for Type 2 Diabetes Diagnosis Integrating Genomics, Clinical, and Lifestyle Information

  • Siniya Nedunkulathil

Student thesis: Master's Thesis

Abstract

Genotype-phenotype prediction has become a central viewpoint in precision medicine. Genetic mutations can alter drug metabolism and heighten susceptibility to diseases, such as Type 2 Diabetes (T2D), which is characterized by impaired insulin utilization and hyperglycaemia, thereby elevating the risk of other medical complications. Although contemporary risk diagnosis models for T2D are often constructed from a restricted set of factors, their generalizability to diverse patient cohorts may be suboptimal. This investigation evaluated risk factors for T2D utilizing the prospective cohort of the UK Biobank with data-driven approaches by utilizing machine learning (ML) and genome-wide association studies (GWAS). This study explored the use of the UK Biobank dataset for T2D diagnosis, using a wide range of features including clinical, genetic, and lifestyle factors with ML. This research approach is innovative in that we have utilized a dataset and a wide range of latent features that have not been commonly used for T2D diagnosis. Furthermore, this study has achieved better accuracy in T2D diagnosis compared to existing studies that have utilized the combination of genetic and clinical datasets for this purpose. The diagnostic model was implemented separately for different age groups of males and females by incorporating clinical, lifestyle, and genetic data and achieved high accuracy with an AUC of 0.918 and 88.95% accuracy during validation among males up to 50 years, and an AUC of 0.8486 and accuracy of 87.44% among females up to 50 years. Moreover, it emphasizes the significance of the extracted latent features in the diagnosis of T2D risk. In addition, we also investigated the potential of using genotype data alone for T2D diagnosis by applying the LASSO regularisation method to select relevant SNPs. The results indicated modest detective power, with AUCs of 0.65 for males and 0.64 for females. Furthermore, the study highlights that the LASSO regularization method is a robust tool for identifying SNPs relevant to disease and has the potential to enhance our comprehension of the genetic basis of complex diseases.
Date of AwardAug 2023
Original languageAmerican English
SupervisorAndreas Henschel (Supervisor)

Keywords

  • Machine Learning
  • GWAS
  • UK Biobank

Cite this

'