Leveraging Whole Genome Sequences to Compare Mutational Mechanism and Identify Medically Relevant Variation in African versus Non-African Descend Populations

Master Thesis

2020

Permanent link to this Item
Authors
Supervisors
Journal Title
Link to Journal
Journal ISSN
Volume Title
Publisher
Publisher
License
Series
Abstract
Whole-Genome Sequencing (WGS) is ushering a new era in healthcare and research in identifying genetic variation in all populations. However, the African populations are still under-represented. Since African populations are being the most genetically diverse with high heterogeneity rate, we need to benchmark the Whole Genome Sequence (WGS) analysis pipeline to ensure reliable mutation detection. Therefore, it is essential to ensure that all steps of WGS downstream analysis are accurate, mainly the variant calling (VC). Current VC tools may produce falsepositive/negative results; such result may produce misleading conclusions in prioritisation of mutation, clinical relevancy and actionability of genes. With such many VC tools, two questions have arisen. Firstly, which tool has a high rate of sensitivity and precision in low either high coverage African sequences, given they have high genetic diversity and heterogeneity? Secondly, does the improvement of the VC result will advance the accuracy of detecting mutation and incidental finding (actionable genes) in African populations? In this project, a total of 100 DNA sequence samples was simulated (of which every 50 samples mimicked the genetics background of African and European, respectively) at different coverage (high and low). In particular, the sensitivity to discover polymorphisms was done by nine different VC tools. These tools were assessed in term of false positive/negative call rate given the simulated golden variants. Combining our result on sensitivity and positive predictive value (PPV). Lofreq performs best in African population data (sens=0.85, PPV=0.983, F-score=0.91) on high/low coverage data; as a result, we chose Lofreq to perform variant calling, and Gene-based annotation is performed to conduct in-sillico predication of mutation on publicly available data (the African Genome Variation and 1000 Genome Project). In doing so, we have leveraged WGS to examine and validate four of burden diseases in the African content, such as communicable diseases: HIV/AIDS, Malaria, Tuberculosis (TB), and Non-communicable diseases: such as Sickle cell disease, these diseases have uniquely shaped ethnic-specific and continental genomics variation and therefore provides unprecedented opportunities to map disease genes across the African continent. Moreover, the current actionable gene recommended by The American College of Medical Genetics and Genomics (ACMG) in the African population and update on additional African-specific actionable genes. Our result suggests African and African diaspora ethnic groups, particularly Bantu and Khoesan ethnics have gene diversity, high proportion of derived allele at low minor allele frequency (0.0 − 01) and the highest proportion of pathogenic variants within HIV, TB, Malaria, Sickle-Cell disease, while non-African ethnic groups including Latin America, Afro-Asiatic European related ethnic groups have high proportion of pathogenic variants within current actionable gene list. Overall, given the observed highest genetic diversity found in African ethnics and African diaspora related ethnics at these four Africa burden diseases and current actionable gene associated, our results support (1) the use of personalised medicine as beneficial to both African continent and worldwide; (2) a recommendation for African-specific actionable list of genes to further improve African and diaspora healthcare.
Description

Reference:

Collections