TY - JOUR
T1 - A Population-Specific Major Allele Reference Genome From The United Arab Emirates Population
AU - Daw Elbait, Gihan
AU - Henschel, Andreas
AU - Tay, Guan K.
AU - Al Safar, Habiba S.
N1 - Funding Information:
Genome sequencing was funded from internal research awards from the Khalifa University awarded to HA.
Publisher Copyright:
© Copyright © 2021 Daw Elbait, Henschel, Tay and Al Safar.
PY - 2021/4/23
Y1 - 2021/4/23
N2 - The ethnic composition of the population of a country contributes to the uniqueness of each national DNA sequencing project and, ideally, individual reference genomes are required to reduce the confounding nature of ethnic bias. This work represents a representative Whole Genome Sequencing effort of an understudied population. Specifically, high coverage consensus sequences from 120 whole genomes and 33 whole exomes were used to construct the first ever population specific major allele reference genome for the United Arab Emirates (UAE). When this was applied and compared to the archetype hg19 reference, assembly of local Emirati genomes was reduced by ∼19% (i.e., some 1 million fewer calls). In compiling the United Arab Emirates Reference Genome (UAERG), sets of annotated 23,038,090 short (novel: 1,790,171) and 137,713 structural (novel: 8,462) variants; their allele frequencies (AFs) and distribution across the genome were identified. Population-specific genetic characteristics including loss-of-function variants, admixture, and ancestral haplogroup distribution were identified and reported here. We also detect a strong correlation between FST and admixture components in the UAE. This baseline study was conceived to establish a high-quality reference genome and a genetic variations resource to enable the development of regional population specific initiatives and thus inform the application of population studies and precision medicine in the UAE.
AB - The ethnic composition of the population of a country contributes to the uniqueness of each national DNA sequencing project and, ideally, individual reference genomes are required to reduce the confounding nature of ethnic bias. This work represents a representative Whole Genome Sequencing effort of an understudied population. Specifically, high coverage consensus sequences from 120 whole genomes and 33 whole exomes were used to construct the first ever population specific major allele reference genome for the United Arab Emirates (UAE). When this was applied and compared to the archetype hg19 reference, assembly of local Emirati genomes was reduced by ∼19% (i.e., some 1 million fewer calls). In compiling the United Arab Emirates Reference Genome (UAERG), sets of annotated 23,038,090 short (novel: 1,790,171) and 137,713 structural (novel: 8,462) variants; their allele frequencies (AFs) and distribution across the genome were identified. Population-specific genetic characteristics including loss-of-function variants, admixture, and ancestral haplogroup distribution were identified and reported here. We also detect a strong correlation between FST and admixture components in the UAE. This baseline study was conceived to establish a high-quality reference genome and a genetic variations resource to enable the development of regional population specific initiatives and thus inform the application of population studies and precision medicine in the UAE.
KW - Arab genome
KW - next generation sequencing
KW - population genetics
KW - population representative sampling
KW - reference genome
KW - structural variants
KW - UAE reference genome
UR - http://www.scopus.com/inward/record.url?scp=85105512759&partnerID=8YFLogxK
U2 - 10.3389/fgene.2021.660428
DO - 10.3389/fgene.2021.660428
M3 - Article
AN - SCOPUS:85105512759
SN - 1664-8021
VL - 12
JO - Frontiers in Genetics
JF - Frontiers in Genetics
M1 - 660428
ER -