TY - GEN
T1 - Tutorial on 8 Genotype Files Conversion
AU - Muneeb, Muhammad
AU - Feng, Samuel
AU - Henschel, Andreas
N1 - Funding Information:
This publication is based upon work supported by the Khalifa University of Science and Technology under Award No. CIRA-2019-050 to SFF.
Publisher Copyright:
© 2022 IEEE.
PY - 2022
Y1 - 2022
N2 - This article documents the files format conversion procedures for eight different genotype file formats using existing tools like Plink, Samtools, Gtools, and custom code script where necessary. It provides documentation and the corresponding code segment for each conversion to serve conversion procedures in a plate to beginners and researchers to build on top of the existing code to develop enhanced and fast conversion procedures. The code is written in Python and GNU commands, enabling deployment from general-purpose computers to high-performance computing setups. In addition, the documentation is written in the form of the tutorial, highlighting the reason for using a particular step in the conversion procedure and its effect on intermediate genotype data, ultimately enhancing the comprehension abilities of people struggling with file conversion when developing their pipelines for the analysis. In the first version of the documentation, we considered eight file formats: VCF, BED-BIM-FAM, PED-MAP, GEN-SAMPLE, RAW, HAPS-LEGEND-SAMPLE, 23andme, and AncestryDNA.
AB - This article documents the files format conversion procedures for eight different genotype file formats using existing tools like Plink, Samtools, Gtools, and custom code script where necessary. It provides documentation and the corresponding code segment for each conversion to serve conversion procedures in a plate to beginners and researchers to build on top of the existing code to develop enhanced and fast conversion procedures. The code is written in Python and GNU commands, enabling deployment from general-purpose computers to high-performance computing setups. In addition, the documentation is written in the form of the tutorial, highlighting the reason for using a particular step in the conversion procedure and its effect on intermediate genotype data, ultimately enhancing the comprehension abilities of people struggling with file conversion when developing their pipelines for the analysis. In the first version of the documentation, we considered eight file formats: VCF, BED-BIM-FAM, PED-MAP, GEN-SAMPLE, RAW, HAPS-LEGEND-SAMPLE, 23andme, and AncestryDNA.
KW - bioinformatics
KW - computational biology
KW - genetic file conversion
KW - genetic tools
KW - genetics
UR - https://www.scopus.com/pages/publications/85134344046
U2 - 10.1109/ICBCB55259.2022.9802470
DO - 10.1109/ICBCB55259.2022.9802470
M3 - Conference contribution
AN - SCOPUS:85134344046
T3 - 2022 10th International Conference on Bioinformatics and Computational Biology, ICBCB 2022
SP - 13
EP - 17
BT - 2022 10th International Conference on Bioinformatics and Computational Biology, ICBCB 2022
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 10th International Conference on Bioinformatics and Computational Biology, ICBCB 2022
Y2 - 13 May 2022 through 15 May 2022
ER -