TY - GEN
T1 - Heritability, genetic variation, and the number of risk SNPs effect on deep learning and polygenic risk scores AUC
AU - Muneeb, Muhammad
AU - Feng, Samuel
AU - Henschel, Andreas
N1 - Funding Information:
This publication is based upon work supported by the Khalifa University of Science and Technology under Award No. CIRA-2019-050 to SFF.
Publisher Copyright:
© 2022 ACM.
PY - 2022/5/27
Y1 - 2022/5/27
N2 - For genotype-phenotype classification, many methods are used, like polygenic risk scores and deep learning, each using a different computation technique. The performance of each method varies depending on the genetic variation and is measured by accuracy or area under the curve (AUC). This article investigates the relationship between deep learning classifiers and polygenic risk scores performance for genotype-phenotype classification with respect to variation in heritability, genetic variation, and the number of risk SNP (400 different datasets of 5000 people) through extensive computation. These variation helps to find an optimal classifier for a dataset with specific heritability and an expected score for a specific case/control classification. The deep learning classifier AUC decreases with an increase in heritability, whereas the polygenic risk scores AUC improves. The machine-learning algorithm has low AUC for high genetic variation, but for low genetic variation, AUC is high. PRS tools have the opposite behavior; for high genetic variation, the PRS tools have high AUC compared to low genetic variation data sets. The article gives a basic template showing deep learning or PRS tools should be used depending on the heritability and genetic variation of the dataset. All the code segments are available publically to generate datasets with different parameters and explore such patterns.
AB - For genotype-phenotype classification, many methods are used, like polygenic risk scores and deep learning, each using a different computation technique. The performance of each method varies depending on the genetic variation and is measured by accuracy or area under the curve (AUC). This article investigates the relationship between deep learning classifiers and polygenic risk scores performance for genotype-phenotype classification with respect to variation in heritability, genetic variation, and the number of risk SNP (400 different datasets of 5000 people) through extensive computation. These variation helps to find an optimal classifier for a dataset with specific heritability and an expected score for a specific case/control classification. The deep learning classifier AUC decreases with an increase in heritability, whereas the polygenic risk scores AUC improves. The machine-learning algorithm has low AUC for high genetic variation, but for low genetic variation, AUC is high. PRS tools have the opposite behavior; for high genetic variation, the PRS tools have high AUC compared to low genetic variation data sets. The article gives a basic template showing deep learning or PRS tools should be used depending on the heritability and genetic variation of the dataset. All the code segments are available publically to generate datasets with different parameters and explore such patterns.
KW - applied deep learning
KW - genetic variation
KW - heritability
KW - polygenic risk scores
KW - risk SNPs
UR - http://www.scopus.com/inward/record.url?scp=85142868800&partnerID=8YFLogxK
U2 - 10.1145/3543377.3543387
DO - 10.1145/3543377.3543387
M3 - Conference contribution
AN - SCOPUS:85142868800
T3 - ACM International Conference Proceeding Series
SP - 65
EP - 71
BT - ICBBT 2022 - Proceedings of 2022 14th International Conference on Bioinformatics and Biomedical Technology
T2 - 14th International Conference on Bioinformatics and Biomedical Technology, ICBBT 2022
Y2 - 27 May 2022 through 29 May 2022
ER -