Disease population genetic simulation framework: towards application in modelling disease risk prediction and heritability rate

Thesis / Dissertation

2024

Permanent link to this Item
Authors
Journal Title
Link to Journal
Journal ISSN
Volume Title
Publisher
Publisher

University of Cape Town

License
Series
Abstract
It has now been over a decade since the first genome-wide association study (GWAS) was conducted. The field has since experienced monumental strides, as evidenced by over half a million associations submitted in the GWAS catalog today, advancements in sequencing technologies, and the development of robust methodologies for genetic data analysis. Our understanding of genetic disease underpinnings has exploded. It is thus unfortunate that a substantial proportion of these studies continue to be of European populations, while GWAS in Africans and other diverse populations lag behind. Concerns about the disparity in genomics studies once these findings are translated into biological functions and clinical interventions have been raised. The field of GWAS also continues to be haunted by the fact that the identified variants still explain a very small proportion of the heritability of several common complex diseases, making their translation challenging. Efforts have been put in place to extend GWAS to diverse populations. Unfortunately, most of the currently developed and commonly used tools in GWAS have been benchmarked and applied to populations of European descent. However, the complex genetic make-up of Africans and other diverse populations implies that different GWAS approaches are imperative. As our world becomes more and more connected through technology, the interbreeding of genetically isolated populations is inevitable and on the rise. Consequently, as GWAS extends to diverse populations, the majority will be mixed-ancestry (admixed) populations that result from such human interbreeding. Association studies of admixed populations have mainly capitalized on admixture mapping or admixture association, which associates the overrepresentation of a given ancestry in cases compared to controls at a given location of the genome with a given trait. However, the application of GWAS to these populations is now being explored, especially as genotyping costs continue to decrease. Research has shown that combining both standard GWAS and admixture association approaches has the potential to improve power in disease scoring statistics. Currently, most joint methods developed are targeted at admixed populations that result from the interbreeding of two genetically isolated populations (2-way admixed populations). The two proposed joint methods for admixed populations with more than two ancestral populations (multi-way admixed populations) are not optimized for such populations. This excludes a larger portion of multi-way admixed populations. We therefore set forth to develop a joint ancestry and SNP association tool tailored to multi-way admixed populations Firstly, we developed a simulation tool that generates realistic homogeneous and/or complex admixed genetic data under various population genetic scenarios that incorporate recombination, mutation, random mating, disease models, admixture, and natural selection, called FractalSIM. The tools require a reference population as input, implement a resampling approach, and retain the frequency of the minor allele and the linkage disequilibrium (LD) patterns of the reference population in the simulation of the resultant dataset. We assessed FractalSIM output using commonly used genetic tools. By employing simulated African, European, and multi-way admixed datasets from FractalSIM, we evaluated commonly used GWAS tools and leveraged the results to discuss an optimized framework for GWAS in diverse populations, such as Africans and admixed populations. By implementing linear mixed models in a full Bayesian context, we developed a joint ancestry and SNP association approach for multi-way admixed populations, JasMAP, that leverages genotypes and ancestry signals to improve GWAS power in these populations. We evaluated the tool using simulated data generated from FractalSIM and benchmarked the output by comparing it with results from other tools. We also applied JasMAP to a South African Coloured (SAC) population, a uniquely 5-way admixed population with a high prevalence of tuberculosis (TB), to identify genetic variants underlying ethnic differences in TB. We have established that JasMAP performs better than other commonly used tools in leveraging genotypes and ancestry risk to improve power in GWAS. In the application of JasMAP to a GWAS of the SAC population, we obtained 13 significant SNPs using the joint association, 12 of which were detected at marginal or substantial thresholds in the genotype-only and ancestry-only associations. By gene-mapping analysis, these SNPs were found near 8 genes, of which 4 were associated with TB based on their functionality, via pathway analysis, and links to social behavior that lead to an increased risk of TB. In particular, one of the significant SNPs on chromosome 4 was linked to SLC7A11 gene, which has previously been linked to TB in a GWAS study of a Chinese population.
Description

Reference:

Collections