Skip to main navigation Skip to search Skip to main content

Genome Architecture Of Arab Subpopulations Of The United Arab Emirates

  • Mariam Al Ali

Student thesis: Master's Thesis

Abstract

The human genome is one of the most fascinating and complicated structures. Numerous international efforts were generated to study the genetic makeup of different diverse populations. However, despite the varied demographical composition and ethnic diversity of the Arab-speaking world; there is still a lack of genome data on ethnic groups from Middle East in the Human Genome catalogue. This project is one of the efforts initiated in the Arabian region to execute a genome based study. In the first instance, the study presents a review on the history and evolution of the genome sequencing technology. There are number of different techniques that enable reading the DNA. Genotyping is one of the affordable techniques used to sample specific targeted positions within the genome that are mostly known to be disease associated. This method was used in a pilot study targeting replication of four widely studied genetic variants, rs7903146 (TCF7L2), rs5219 (KCNJ11), rs10946398 (CDKAL1), and rs9939609 (FTO) with Type 2 Diabetes Mellitus (T2DM) among the UAE population. The study aims on identifying their association with the susceptibility of Type 2 Diabetes Mellitus in a sample size of 264 unrelated diabetic Emirati patients. The samples were genotyped using TaqMan® real-Time PCR assays and the results were statistically analyzed using t-test, chi-square test and multiple logistic regression model incorporating age, gender, body mass index (BMI), and hypertension as covariates. The results of the study failed to be consistent with the findings of numerous international efforts that confirmed the direct link between the four SNPs and the incidence of T2DM. Only one SNP rs7903146 (TCF7L2) was identified to be a risk for T2DM susceptibility among the United Arab Emirates population, while the other variants were not directly related to T2DM incidence but to some of its risk factors and related traits. The output of study indicates the need to have more efforts directed toward understanding the genomic architecture of the United Arab Emirates population. While genotyping can only target certain regions in the genome, sequencing gives detailed information on the exact order of the genomic bases and a comprehensive data representation. Understanding the available techniques and their potential and opportunities aided in II designing a project initiative, a 1000 Arab genome project for the UAE population in an effort to have a platform that represents the genomic architecture of the Emirati citizens. Therefore, a study was initiated as a prototype of this national project. In this study, two randomly selected individuals from the UAE population (one male and one female) were sequenced by Next Generation Sequencer platform and analyzed through in-house developed pipeline. Next Generation Sequencer platform was used to sequence the two samples and generate a high-throughput data at coverage > 27X. The enormous amount of data was run through a tailored pipeline using open source tools. The statistical analysis performed on the raw data followed the general process of checking the reads quality, mapping the generated reads to a reference genome (hg19), performing base recalibration and calling the variants, then annotating the variants to widely used databases such as Clinvar and dbSNP and GnomAD. The databases used and the annotation tool SnpEff aided in classifying the identified variants in accordance to their genomic location, functional class, clinical significance, etc. In order to understand the possible ethnic background of the two studied individuals, genetic ancestry was estimated using principal component analysis (PCA) supervised admixture analysis, phylogenetic analysis, in addition to Y-chromosome and the mtDNA haplogroups identification. The findings of the three statistical tests (PCA, phylogenetic tree, and haplogroups identification) were in concordance and presented a possible Central/South Asian lineage in relation to the reference database used (Human Genome Diversity Project) and the available data on the haplogroups. Other statistical tests were used in the study include comparing the results to a genotyping data using concordance analysis in order to validate our variants called. Each of the DNA related technologies has its own benefits, challenges and drawbacks. Understanding the opportunities and potentials of each technology will aid in determining which technique is the most suitable to answer the research question. Nonetheless, it is always best have an in depth representation of the genomic structure for each ethnically diverse populations in order to explore the mysteries of the DNA.
Date of AwardJul 2018
Original languageAmerican English
SupervisorHabiba Alsafar (Supervisor)

Keywords

  • Genomic Architecture; DNA; Genotyping; United Arab Emirates; Emirati Patients; T2DM.

Cite this

'