Whole Exome Sequencing identifies de novo genetic variants in Type 1 Diabetes Family Trios in the United Arab Emirates

  • Sara Omar Albarguthi

Student thesis: Master's Thesis


Type 1 diabetes is a complicated disease that occurs at all age ranges and impacts the affected individual's lifestyle dramatically. The pathogenesis of this disease is still unknown but speculated to be associated with genetic mutations and/or environmental influences. Although the prevalence of T1D in the United Arab Emirates is not as impressive as type 2 diabetes, its global presence is increasing at a rapid rate as well as in the UAE. The genetic influences on the disease have been researched in many ethnic communities, predominantly European and North America i.e. Caucasian ethnicities, as well as Japanese and Chinese populations. There have also been studies performed in the Middle Eastern region, which advances our knowledge on genetic variants association to T1D and the influence of ethnicity and environmental differences on these variants. This project is the first effort to study the genetic variants found in T1D patients in the United Arab Emirates. First, an introduction of the genome wide-association studies of type 1 diabetes was presented, along with a comprehensive background of Next generation Sequencing (NGS) and the purpose of choosing whole exome sequencing for this particular research, as well as a guideline to the bioinformatics tools used to analyze the data. Secondly, an extensive review of the genetic polymorphisms and environmental influences of type 1 diabetes in the Middle East was performed. The PRISMA systematic review and meta-analysis guidelines were used for this review. The research began with 782 studies, of which 44 were used for the metaanalysis of the genetic variants studies and 8 were used for the meta-analysis of the environmental factor's studies. The AG genotype of CTLA-4 rs536149078 SNP and CT genotype of PTPN22 rs2476601 SNP significantly favored the incidence of T1D in the Middle East. As for the environmental factors, cow's milk in the first year of life, positive family history of diabetes, viral infections and vitamin D deficiency are risk factors in T1D etiology, while breastfeeding, consanguinity and vitamin D supplementation play a protective role on the progression of T1D. After developing a strong understanding of the findings of the Middle East and the regions surrounding the UAE, we speculated on the type of findings we expected to get as a result of the whole exome sequencing. The environmental association to T1D was the initial interest, that allowed us to look into de novo variants in T1D patients. The purpose of studying these variants is to see if any of them play a role in the pathways of disease and could be provide more information regarding the environmentally affected genetic mutations that cause pathogenesis. The choice of studying families with unaffected parents and siblings and an affected child is so that we may attribute the mutation specifically to the T1D that even the healthy family members who have interacted with a similar environment did not have. In order to confirm the impact of this method of research, an initial study performed on a single trio family was performed. Saliva samples were collected and Whole Exome Sequencing (WES) was performed using TruSeq Exome Library kit. Data analysis was performed using Genome Analysis Tool Kit (GATK) then the Next Generation Sequencing (NGS) reads were subjected to quality control to remove any low quality reads, then the raw reads for each sample in the FastQ format files were checked for their quality using FastQC software, version 0.11.5 and an analysis was focused on SNVs, SNPs and INDELs that had a minor allele frequency < 0.01 using gnomAD v2.1.1. Finally, the biological function analysis of the variants was analyzed to see any relation to T1D or its pathways, or relation to autoimmune disease or related pathways, or relation to environmental influences. Eight variants that belonged to six different genes were found to be related to T1D TDG, TYRO3, GPLD1, LIG1, KIR2DS4 and KIR3DS1. Three of the variants that belong to three genes were related to autoimmune/immune system and pathway (MST1L, CACNA1B and IPO7), and three were related to environmental factors (ZNF141, ZNF717 and PABPC1). There were nineteen found novel variants to genes unrelated to all three factors, suggesting novelty and possibly uniqueness devoted to the Emirati ethnicity or environment. The findings of the trio study motivated us to pursue a larger cohort study with 13 families. The second study was pursued in the same manner as the first, using TruSeq Exome Library kit by Illumina and following a similar protocol for bioinformatics and data analysis. The de novo variants found in the 13 families were much more and required another filtering step in order to validate their association to disease pathogenesis, so only variants that were found in two or more diabetic children were analyzed further. The same genes/variants found in the trio study were seen again in multiple families, suggesting a strong association of those variants with T1D (TDG, TYRO3 and MST1). There were even some common variants found again in other families that were related to autoimmune pathways, MST1L, CACNA1B and PABPC1. Our findings discovered novel and previously unreported variants in the T1D probands, that require further analysis and genotyping in order to label them as disease causing. This study is the only one of its kind so far and can act as a guideline for future researchers to pursue similar studies.
Date of AwardDec 2020
Original languageAmerican English


  • T1D
  • type 1 diabetes
  • WES
  • whole exome sequencing
  • de novo
  • United Arab Emirates (UAE)
  • Family study
  • trios.

Cite this