Investigating local ancestry inference models in mixed ancestry individual genomes

dc.contributor.advisorMazandu, Gaston K
dc.contributor.advisorChimusa, Emile R
dc.contributor.advisorMulder, Nicola J
dc.contributor.authorGeza, Ephifania
dc.date.accessioned2023-03-06T13:24:57Z
dc.date.available2023-03-06T13:24:57Z
dc.date.issued2022
dc.date.updated2023-02-20T12:48:20Z
dc.description.abstractOwing to historical events including the slave trade, agricultural interests, colonialism, and political and/or economical instability, most modern humans are a mosaic of segments originating from different populations. They result from the interbreeding of two or more previously isolated populations, leading to admixture. Known admixed populations include the mixed ancestry of South Africa, Latin Americans and African Americans. Admixed individuals play important roles in understanding population history, disease aetiology, and personal genomics. Accordingly, efforts have been made to understand the genetic composition of such individuals, yielding several models that infer the ancestry of every chromosomal segment in admixed individuals (local ancestry). However, new research questions emerged concerning model statistical and biological parameters, as well as the performance of these models across admixed datasets. This elicited the need for examining existing local ancestry inference models in order to identify and tackle critical issues of these models, which is the main goal of this thesis. We achieve this in four steps, constituting the main contributions of this PhD project: (1) Qualitative assessment of existing models through a systematic review; (2) Building a unified framework integrating existing models for inferring and assessing local ancestry estimates; (3) Quantitative assessment of existing methods within the same framework; and (4) Proposing a model extension to account for natural selection and the origin of modern humans to improve the accuracy of local ancestry estimates. Firstly, we assess models using published results on different datasets and performance measures, to orient modellers and software developers on the future trends in local ancestry inference. Secondly, to address the challenges identified in (1) including model complexity reflected in the distinct inputs each model requires and outputs formats, we design a unified framework, referred to as FRANC, to manipulate tool-specific inputs, deconvolve ancestry and standardise outputs, to ease the inference process and pave the way for model assessment. Thirdly, using FRANC, we assess the performance of eight state-of-the-art models on simulated admixed population datasets involving three and five ancestral populations. LAMP-LD and LOTER performed better than the other six tested models on admixed populations involving five ancestral populations while RFMIX, WINPOP, ELAI and LAMP-LD were comparable in admixed datasets involving three populations. Performance was evaluated based on performance measures borrowed from the machine learning confusion matrix. Finally, we noted that it may be more practical to extend existing models to incorporate more realistic biological assumptions. Hence, we propose a nonparametric hidden Markov model, that adjusts an existing model mSPECTRUM to account for natural selection and state-persistence when deconvolving local ancestry, which should improve the accuracy of estimates. Similarly to mSPECTRUM, this acknowledges the two common hypotheses on the origin of modern humans, making it comparable to mSPECTRUM which has been shown to be competitive with HAPMIX, a benchmark for two-way admixtures. Therefore, these four are a good contribution to admixture analysis of populations.
dc.identifier.apacitationGeza, E. (2022). <i>Investigating local ancestry inference models in mixed ancestry individual genomes</i>. (). ,Faculty of Health Sciences ,Department of Integrative Biomedical Sciences (IBMS). Retrieved from http://hdl.handle.net/11427/37276en_ZA
dc.identifier.chicagocitationGeza, Ephifania. <i>"Investigating local ancestry inference models in mixed ancestry individual genomes."</i> ., ,Faculty of Health Sciences ,Department of Integrative Biomedical Sciences (IBMS), 2022. http://hdl.handle.net/11427/37276en_ZA
dc.identifier.citationGeza, E. 2022. Investigating local ancestry inference models in mixed ancestry individual genomes. . ,Faculty of Health Sciences ,Department of Integrative Biomedical Sciences (IBMS). http://hdl.handle.net/11427/37276en_ZA
dc.identifier.ris TY - Doctoral Thesis AU - Geza, Ephifania AB - Owing to historical events including the slave trade, agricultural interests, colonialism, and political and/or economical instability, most modern humans are a mosaic of segments originating from different populations. They result from the interbreeding of two or more previously isolated populations, leading to admixture. Known admixed populations include the mixed ancestry of South Africa, Latin Americans and African Americans. Admixed individuals play important roles in understanding population history, disease aetiology, and personal genomics. Accordingly, efforts have been made to understand the genetic composition of such individuals, yielding several models that infer the ancestry of every chromosomal segment in admixed individuals (local ancestry). However, new research questions emerged concerning model statistical and biological parameters, as well as the performance of these models across admixed datasets. This elicited the need for examining existing local ancestry inference models in order to identify and tackle critical issues of these models, which is the main goal of this thesis. We achieve this in four steps, constituting the main contributions of this PhD project: (1) Qualitative assessment of existing models through a systematic review; (2) Building a unified framework integrating existing models for inferring and assessing local ancestry estimates; (3) Quantitative assessment of existing methods within the same framework; and (4) Proposing a model extension to account for natural selection and the origin of modern humans to improve the accuracy of local ancestry estimates. Firstly, we assess models using published results on different datasets and performance measures, to orient modellers and software developers on the future trends in local ancestry inference. Secondly, to address the challenges identified in (1) including model complexity reflected in the distinct inputs each model requires and outputs formats, we design a unified framework, referred to as FRANC, to manipulate tool-specific inputs, deconvolve ancestry and standardise outputs, to ease the inference process and pave the way for model assessment. Thirdly, using FRANC, we assess the performance of eight state-of-the-art models on simulated admixed population datasets involving three and five ancestral populations. LAMP-LD and LOTER performed better than the other six tested models on admixed populations involving five ancestral populations while RFMIX, WINPOP, ELAI and LAMP-LD were comparable in admixed datasets involving three populations. Performance was evaluated based on performance measures borrowed from the machine learning confusion matrix. Finally, we noted that it may be more practical to extend existing models to incorporate more realistic biological assumptions. Hence, we propose a nonparametric hidden Markov model, that adjusts an existing model mSPECTRUM to account for natural selection and state-persistence when deconvolving local ancestry, which should improve the accuracy of estimates. Similarly to mSPECTRUM, this acknowledges the two common hypotheses on the origin of modern humans, making it comparable to mSPECTRUM which has been shown to be competitive with HAPMIX, a benchmark for two-way admixtures. Therefore, these four are a good contribution to admixture analysis of populations. DA - 2022_ DB - OpenUCT DP - University of Cape Town KW - Integrative Biomedical Sciences LK - https://open.uct.ac.za PY - 2022 T1 - Investigating local ancestry inference models in mixed ancestry individual genomes TI - Investigating local ancestry inference models in mixed ancestry individual genomes UR - http://hdl.handle.net/11427/37276 ER - en_ZA
dc.identifier.urihttp://hdl.handle.net/11427/37276
dc.identifier.vancouvercitationGeza E. Investigating local ancestry inference models in mixed ancestry individual genomes. []. ,Faculty of Health Sciences ,Department of Integrative Biomedical Sciences (IBMS), 2022 [cited yyyy month dd]. Available from: http://hdl.handle.net/11427/37276en_ZA
dc.language.rfc3066eng
dc.publisher.departmentDepartment of Integrative Biomedical Sciences (IBMS)
dc.publisher.facultyFaculty of Health Sciences
dc.subjectIntegrative Biomedical Sciences
dc.titleInvestigating local ancestry inference models in mixed ancestry individual genomes
dc.typeDoctoral Thesis
dc.type.qualificationlevelDoctoral
dc.type.qualificationlevelPhD
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
thesis_hsf_2022_geza ephifania.pdf
Size:
7.4 MB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
0 B
Format:
Item-specific license agreed upon to submission
Description:
Collections