Leveraging Whole Genome Sequences to Compare Mutational Mechanism and Identify Medically Relevant Variation in African versus Non-African Descend Populations

dc.contributor.advisorChimusa, Emile R
dc.contributor.authorAlosaimi, Shatha Mobarak
dc.date.accessioned2020-09-09T15:07:27Z
dc.date.available2020-09-09T15:07:27Z
dc.date.issued2020
dc.date.updated2020-09-09T11:05:41Z
dc.description.abstractWhole-Genome Sequencing (WGS) is ushering a new era in healthcare and research in identifying genetic variation in all populations. However, the African populations are still under-represented. Since African populations are being the most genetically diverse with high heterogeneity rate, we need to benchmark the Whole Genome Sequence (WGS) analysis pipeline to ensure reliable mutation detection. Therefore, it is essential to ensure that all steps of WGS downstream analysis are accurate, mainly the variant calling (VC). Current VC tools may produce falsepositive/negative results; such result may produce misleading conclusions in prioritisation of mutation, clinical relevancy and actionability of genes. With such many VC tools, two questions have arisen. Firstly, which tool has a high rate of sensitivity and precision in low either high coverage African sequences, given they have high genetic diversity and heterogeneity? Secondly, does the improvement of the VC result will advance the accuracy of detecting mutation and incidental finding (actionable genes) in African populations? In this project, a total of 100 DNA sequence samples was simulated (of which every 50 samples mimicked the genetics background of African and European, respectively) at different coverage (high and low). In particular, the sensitivity to discover polymorphisms was done by nine different VC tools. These tools were assessed in term of false positive/negative call rate given the simulated golden variants. Combining our result on sensitivity and positive predictive value (PPV). Lofreq performs best in African population data (sens=0.85, PPV=0.983, F-score=0.91) on high/low coverage data; as a result, we chose Lofreq to perform variant calling, and Gene-based annotation is performed to conduct in-sillico predication of mutation on publicly available data (the African Genome Variation and 1000 Genome Project). In doing so, we have leveraged WGS to examine and validate four of burden diseases in the African content, such as communicable diseases: HIV/AIDS, Malaria, Tuberculosis (TB), and Non-communicable diseases: such as Sickle cell disease, these diseases have uniquely shaped ethnic-specific and continental genomics variation and therefore provides unprecedented opportunities to map disease genes across the African continent. Moreover, the current actionable gene recommended by The American College of Medical Genetics and Genomics (ACMG) in the African population and update on additional African-specific actionable genes. Our result suggests African and African diaspora ethnic groups, particularly Bantu and Khoesan ethnics have gene diversity, high proportion of derived allele at low minor allele frequency (0.0 − 01) and the highest proportion of pathogenic variants within HIV, TB, Malaria, Sickle-Cell disease, while non-African ethnic groups including Latin America, Afro-Asiatic European related ethnic groups have high proportion of pathogenic variants within current actionable gene list. Overall, given the observed highest genetic diversity found in African ethnics and African diaspora related ethnics at these four Africa burden diseases and current actionable gene associated, our results support (1) the use of personalised medicine as beneficial to both African continent and worldwide; (2) a recommendation for African-specific actionable list of genes to further improve African and diaspora healthcare.
dc.identifier.apacitationAlosaimi, S. M. (2020). <i>Leveraging Whole Genome Sequences to Compare Mutational Mechanism and Identify Medically Relevant Variation in African versus Non-African Descend Populations</i>. (). ,Faculty of Health Sciences ,Department of Pathology. Retrieved from http://hdl.handle.net/11427/32191en_ZA
dc.identifier.chicagocitationAlosaimi, Shatha Mobarak. <i>"Leveraging Whole Genome Sequences to Compare Mutational Mechanism and Identify Medically Relevant Variation in African versus Non-African Descend Populations."</i> ., ,Faculty of Health Sciences ,Department of Pathology, 2020. http://hdl.handle.net/11427/32191en_ZA
dc.identifier.citationAlosaimi, S.M. 2020. Leveraging Whole Genome Sequences to Compare Mutational Mechanism and Identify Medically Relevant Variation in African versus Non-African Descend Populations. . ,Faculty of Health Sciences ,Department of Pathology. http://hdl.handle.net/11427/32191en_ZA
dc.identifier.risTY - Master Thesis AU - Alosaimi, Shatha Mobarak AB - Whole-Genome Sequencing (WGS) is ushering a new era in healthcare and research in identifying genetic variation in all populations. However, the African populations are still under-represented. Since African populations are being the most genetically diverse with high heterogeneity rate, we need to benchmark the Whole Genome Sequence (WGS) analysis pipeline to ensure reliable mutation detection. Therefore, it is essential to ensure that all steps of WGS downstream analysis are accurate, mainly the variant calling (VC). Current VC tools may produce falsepositive/negative results; such result may produce misleading conclusions in prioritisation of mutation, clinical relevancy and actionability of genes. With such many VC tools, two questions have arisen. Firstly, which tool has a high rate of sensitivity and precision in low either high coverage African sequences, given they have high genetic diversity and heterogeneity? Secondly, does the improvement of the VC result will advance the accuracy of detecting mutation and incidental finding (actionable genes) in African populations? In this project, a total of 100 DNA sequence samples was simulated (of which every 50 samples mimicked the genetics background of African and European, respectively) at different coverage (high and low). In particular, the sensitivity to discover polymorphisms was done by nine different VC tools. These tools were assessed in term of false positive/negative call rate given the simulated golden variants. Combining our result on sensitivity and positive predictive value (PPV). Lofreq performs best in African population data (sens=0.85, PPV=0.983, F-score=0.91) on high/low coverage data; as a result, we chose Lofreq to perform variant calling, and Gene-based annotation is performed to conduct in-sillico predication of mutation on publicly available data (the African Genome Variation and 1000 Genome Project). In doing so, we have leveraged WGS to examine and validate four of burden diseases in the African content, such as communicable diseases: HIV/AIDS, Malaria, Tuberculosis (TB), and Non-communicable diseases: such as Sickle cell disease, these diseases have uniquely shaped ethnic-specific and continental genomics variation and therefore provides unprecedented opportunities to map disease genes across the African continent. Moreover, the current actionable gene recommended by The American College of Medical Genetics and Genomics (ACMG) in the African population and update on additional African-specific actionable genes. Our result suggests African and African diaspora ethnic groups, particularly Bantu and Khoesan ethnics have gene diversity, high proportion of derived allele at low minor allele frequency (0.0 − 01) and the highest proportion of pathogenic variants within HIV, TB, Malaria, Sickle-Cell disease, while non-African ethnic groups including Latin America, Afro-Asiatic European related ethnic groups have high proportion of pathogenic variants within current actionable gene list. Overall, given the observed highest genetic diversity found in African ethnics and African diaspora related ethnics at these four Africa burden diseases and current actionable gene associated, our results support (1) the use of personalised medicine as beneficial to both African continent and worldwide; (2) a recommendation for African-specific actionable list of genes to further improve African and diaspora healthcare. DA - 2020_ DB - OpenUCT DP - University of Cape Town KW - Human Genetics LK - https://open.uct.ac.za PY - 2020 T1 - Leveraging Whole Genome Sequences to Compare Mutational Mechanism and Identify Medically Relevant Variation in African versus Non-African Descend Populations TI - Leveraging Whole Genome Sequences to Compare Mutational Mechanism and Identify Medically Relevant Variation in African versus Non-African Descend Populations UR - http://hdl.handle.net/11427/32191 ER -en_ZA
dc.identifier.urihttp://hdl.handle.net/11427/32191
dc.identifier.vancouvercitationAlosaimi SM. Leveraging Whole Genome Sequences to Compare Mutational Mechanism and Identify Medically Relevant Variation in African versus Non-African Descend Populations. []. ,Faculty of Health Sciences ,Department of Pathology, 2020 [cited yyyy month dd]. Available from: http://hdl.handle.net/11427/32191en_ZA
dc.language.rfc3066eng
dc.publisher.departmentDepartment of Pathology
dc.publisher.facultyFaculty of Health Sciences
dc.subjectHuman Genetics
dc.titleLeveraging Whole Genome Sequences to Compare Mutational Mechanism and Identify Medically Relevant Variation in African versus Non-African Descend Populations
dc.typeMaster Thesis
dc.type.qualificationlevelMasters
dc.type.qualificationlevelMSc
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
thesis_hsf_2020_alosaimi shatha mobarak.pdf
Size:
6.87 MB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
0 B
Format:
Item-specific license agreed upon to submission
Description:
Collections