An African Genome Variation Database and its applications in human diversity and health

dc.contributor.advisorMulder, Nicola
dc.contributor.authorTodt, Davis
dc.date.accessioned2022-03-22T09:36:13Z
dc.date.available2022-03-22T09:36:13Z
dc.date.issued2021
dc.date.updated2022-03-22T06:07:05Z
dc.description.abstractAfrican genomes exhibit the highest levels of sequence and haplotype diversity of all extant human populations. A combination of historical as well as geographical factors have contributed toward the high level of genetic diversity in Ancestral populations in Africa. Additionally, a series of concomitant migration events out of Africa, with founder populations harbouring only a subset of this genetic variation, have contributed to the relatively lower genetic diversity observed in non-Africans. Population genetic studies have refined our understanding of human evolutionary history and clinical genomic studies have resulted in improved patient outcomes. However, despite the increased throughput and decreased cost afforded from next-generation sequencing (NGS) and despite the relatively higher genetic variation in Africans, relatively little of the genomic data currently available is representative of diverse African populations. This may result in adverse outcomes in the context of minority populations with little representation in clinical databases. Given the under-representation of African genetic variation and the importance of highlighting and further characterizing it, the objectives of this project were to design, develop and deploy a proof of concept database and web application for the storage, analysis and visualization of African genetic variant data – the African Genome Variation Database (AGVD). The AGVD was developed according to software industry design standards. The project also explored available genomic tools and databases in order to leverage existing software solutions where suitable. Additionally, relevant data sets were identified for use during testing and validation of the pilot phase of the project. To this end, the open access 1000 Genomes Project phase 3 dataset was selected and the genotypes for several chromosomes were loaded into the AGVD. The AGVD leverages the scalable, performant, and open source genomics engine OpenCGA for data storage and analysis. A custom front-end web application was developed by applying a novel approach to render and serve static Vue JS assets from the Python Flask microframework. The web application supports rich data search and filtering operations of loaded variants and allows end-users to visualize annotations of genomic loci and allele change, variant type, associated gene and transcript consequences, clinical significance, and allele frequency information for all annotated cohorts in a highly interactive manner. A bespoke REST API also supports future analytical functionality. The AGVD has demonstrated proof of concept in the secure and scalable storage and visualization of African genomic data, providing a viable solution for H3ABioNet to further extend in future iterations of the project and a valuable resource for researchers to explore African genetic variation.
dc.identifier.apacitationTodt, D. (2021). <i>An African Genome Variation Database and its applications in human diversity and health</i>. (). ,Faculty of Health Sciences ,Department of Integrative Biomedical Sciences (IBMS). Retrieved from http://hdl.handle.net/11427/36188en_ZA
dc.identifier.chicagocitationTodt, Davis. <i>"An African Genome Variation Database and its applications in human diversity and health."</i> ., ,Faculty of Health Sciences ,Department of Integrative Biomedical Sciences (IBMS), 2021. http://hdl.handle.net/11427/36188en_ZA
dc.identifier.citationTodt, D. 2021. An African Genome Variation Database and its applications in human diversity and health. . ,Faculty of Health Sciences ,Department of Integrative Biomedical Sciences (IBMS). http://hdl.handle.net/11427/36188en_ZA
dc.identifier.ris TY - Master Thesis AU - Todt, Davis AB - African genomes exhibit the highest levels of sequence and haplotype diversity of all extant human populations. A combination of historical as well as geographical factors have contributed toward the high level of genetic diversity in Ancestral populations in Africa. Additionally, a series of concomitant migration events out of Africa, with founder populations harbouring only a subset of this genetic variation, have contributed to the relatively lower genetic diversity observed in non-Africans. Population genetic studies have refined our understanding of human evolutionary history and clinical genomic studies have resulted in improved patient outcomes. However, despite the increased throughput and decreased cost afforded from next-generation sequencing (NGS) and despite the relatively higher genetic variation in Africans, relatively little of the genomic data currently available is representative of diverse African populations. This may result in adverse outcomes in the context of minority populations with little representation in clinical databases. Given the under-representation of African genetic variation and the importance of highlighting and further characterizing it, the objectives of this project were to design, develop and deploy a proof of concept database and web application for the storage, analysis and visualization of African genetic variant data – the African Genome Variation Database (AGVD). The AGVD was developed according to software industry design standards. The project also explored available genomic tools and databases in order to leverage existing software solutions where suitable. Additionally, relevant data sets were identified for use during testing and validation of the pilot phase of the project. To this end, the open access 1000 Genomes Project phase 3 dataset was selected and the genotypes for several chromosomes were loaded into the AGVD. The AGVD leverages the scalable, performant, and open source genomics engine OpenCGA for data storage and analysis. A custom front-end web application was developed by applying a novel approach to render and serve static Vue JS assets from the Python Flask microframework. The web application supports rich data search and filtering operations of loaded variants and allows end-users to visualize annotations of genomic loci and allele change, variant type, associated gene and transcript consequences, clinical significance, and allele frequency information for all annotated cohorts in a highly interactive manner. A bespoke REST API also supports future analytical functionality. The AGVD has demonstrated proof of concept in the secure and scalable storage and visualization of African genomic data, providing a viable solution for H3ABioNet to further extend in future iterations of the project and a valuable resource for researchers to explore African genetic variation. DA - 2021_ DB - OpenUCT DP - University of Cape Town KW - Bioinformatics LK - https://open.uct.ac.za PY - 2021 T1 - An African Genome Variation Database and its applications in human diversity and health TI - An African Genome Variation Database and its applications in human diversity and health UR - http://hdl.handle.net/11427/36188 ER - en_ZA
dc.identifier.urihttp://hdl.handle.net/11427/36188
dc.identifier.vancouvercitationTodt D. An African Genome Variation Database and its applications in human diversity and health. []. ,Faculty of Health Sciences ,Department of Integrative Biomedical Sciences (IBMS), 2021 [cited yyyy month dd]. Available from: http://hdl.handle.net/11427/36188en_ZA
dc.language.rfc3066eng
dc.publisher.departmentDepartment of Integrative Biomedical Sciences (IBMS)
dc.publisher.facultyFaculty of Health Sciences
dc.subjectBioinformatics
dc.titleAn African Genome Variation Database and its applications in human diversity and health
dc.typeMaster Thesis
dc.type.qualificationlevelMasters
dc.type.qualificationlevelMSc
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
thesis_hsf_2021_todt davis.pdf
Size:
7.63 MB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
0 B
Format:
Item-specific license agreed upon to submission
Description:
Collections