Browsing by Author "Mulder, Nicola J"
Now showing 1 - 12 of 12
Results Per Page
- ItemRestrictedancGWAS: a post genome-wide association study method for interaction, pathway and ancestry analysis in homogeneous and admixed populations(Oxford University Press, 27) Chimusa, Emile R; Mbiyavanga, Mamana; Mazandu, Gaston K; Mulder, Nicola JDespite numerous successful Genome-wide Association Studies (GWAS), detecting variants that have low disease risk still poses a challenge. GWAS may miss disease genes with weak genetic effects or strong epistatic effects due to the single-marker testing approach commonly used. GWAS may thus generate false negative or inconclusive results, suggesting the need for novel methods to combine effects of single nucleotide polymorphisms within a gene to increase the likelihood of fully characterizing the susceptibility gene. Results: We developed ancGWAS, an algebraic graph-based centrality measure that accounts for linkage disequilibrium in identifying significant disease sub-networks by integrating the association signal from GWAS data sets into the human protein–protein interaction (PPI) network. We validated ancGWAS using an association study result from a breast cancer data set and the simulation of interactive disease loci in the simulation of a complex admixed population, as well as pathway-based GWAS simulation. This new approach holds promise for deconvoluting the interactions between genes underlying the pathogenesis of complex diseases. Results obtained yield a novel central breast cancer sub-network of the human interactome implicated in the proteoglycan syndecan-mediated signaling events pathway which is known to play a major role in mesenchymal tumor cell proliferation, thus providing further insights into breast cancer pathogenesis.
- ItemOpen AccessDetermining ancestry proportions in complex admixture scenarios in South Africa using a novel proxy ancestry selection method(Public Library of Science, 2013) Chimusa, Emile R; Daya, Michelle; Möller, Marlo; Ramesar, Raj; Henn, Brenna M; van Helden, Paul D; Mulder, Nicola J; Hoal, Eileen GAdmixed populations can make an important contribution to the discovery of disease susceptibility genes if the parental populations exhibit substantial variation in susceptibility. Admixture mapping has been used successfully, but is not designed to cope with populations that have more than two or three ancestral populations. The inference of admixture proportions and local ancestry and the imputation of missing genotypes in admixed populations are crucial in both understanding variation in disease and identifying novel disease loci. These inferences make use of reference populations, and accuracy depends on the choice of ancestral populations. Using an insufficient or inaccurate ancestral panel can result in erroneously inferred ancestry and affect the detection power of GWAS and meta-analysis when using imputation. Current algorithms are inadequate for multi-way admixed populations. To address these challenges we developed PROXYANC, an approach to select the best proxy ancestral populations. From the simulation of a multi-way admixed population we demonstrate the capability and accuracy of PROXYANC and illustrate the importance of the choice of ancestry in both estimating admixture proportions and imputing missing genotypes.
- ItemOpen AccessThe development of computational biology in South Africa: successes achieved and lessons learnt(Public Library of Science, 2016) Mulder, Nicola J; Christoffels, Alan; De Oliveira, Tulio; Gamieldien, Junaid; Hazelhurst, Scott; Joubert, Fourie; Kumuthini, Judit; Pillay, Ché S; Snoep, Jacky L; Bishop, Özlem Tastan; Tiffin, NickiBioinformatics is now a critical skill in many research and commercial environments as biological data are increasing in both size and complexity. South African researchers recognized this need in the mid-1990s and responded by working with the government as well as international bodies to develop initiatives to build bioinformatics capacity in the country. Significant injections of support from these bodies provided a springboard for the establishment of computational biology units at multiple universities throughout the country, which took on teaching, basic research and support roles. Several challenges were encountered, for example with unreliability of funding, lack of skills, and lack of infrastructure. However, the bioinformatics community worked together to overcome these, and South Africa is now arguably the leading country in bioinformatics on the African continent. Here we discuss how the discipline developed in the country, highlighting the challenges, successes, and lessons learnt.
- ItemOpen AccessExploring new methodologies to identify disease-associated variants in African populations through the integration of patient genotype data and clinical phenotypes derived from routine health data: A case study for Type 2 Diabetes Mellitus in patients in the Western Cape Province, South Africa(2023) Tamuhla, Tsaone; Tiffin, Nicola; Mulder, Nicola JThesis Title Exploring new methodologies to identify disease-associated variants in African populations through the integration of patient genotype data and clinical phenotypes derived from routine health data: A case study for Type 2 Diabetes Mellitus patients in the Western Cape Province, South Africa. Abstract Introduction There is poor knowledge on the genetic drivers of disease in African populations and this is largely driven by the limited data for human genomes from sub-Saharan Africa. While the costs of generating human genomic data have gone down significantly, they are still a barrier to generating large scale African genomic data. This project is therefore a proof-of-concept pilot study that demonstrates the implementation of a cost-effective, scalable genotyped virtual cohort that can address population level genomic questions. Methods We optimised a tiered informed consent process that is suitable for the cohort study design and adapted it to conducting human genomic research in the African context. We used an existing dataset to explore statistical methods for modelling longitudinal routine health data into a standardised phenotype for genome wide association studies (GWAS). We then conducted a feasibility study and piloted the tiered informed consent process, DNA collection by buccal swab and DNA extraction from buccal swabs and peripheral blood samples. DNA samples were genotyped for approximately 2.2 million variants on the Infinium™ H3Africa Consortium Array V2. Genotyping quality control (QC) was done in Plink 1.9 and genome wide imputation on the Sanger Imputation Service. We demonstrated successful variant calling and provide aggregate statistics for known aetiological variants for type 2 diabetes and severe COVID-19 as well as demonstrating the feasibility of running nested case-control GWAS with these data. Results We demonstrate the use of routine health data to provide complex phenotypes to link to genotype data for both non-communicable diseases (diabetes) and infectious diseases (Tuberculosis, HIV and COVID-19). 459 participants consented to providing a DNA sample and access to their routine health data and were included in the feasibility study. A total of 343 DNA samples and 1782023 genotyped variants passed quality control and were available for further analysis. While most of the cohort population clustered with the 1000 genomes African population, principal component analysis showed extensive population admixture. For the COVID-19 analysis, we identified 63 cases of severe COVID-19 and 280 controls, and for the type 2 diabetes analysis we identified 93 cases and 250 controls using the routine health data of participants in the cohort. While the sample sizes were insufficient for a GWAS we were able to evaluate known type 2 diabetes mellitus and COVID-19 variants in the study population. Conclusion We have described how we conceptualised and implemented a genotyped virtual population cohort in a resource constrained environment, and we are confident that this design and implementation are appropriate to scale up the cohort to a size where novel health discoveries can be made through nested case-control studies. In the interim we demonstrate the analysis and validation of aetiological variants identified in other studies and populations.
- ItemOpen AccessInformation content-based gene ontology functional similarity measures: which one to use for a given biological data type?(Public Library of Science, 2014) Mazandu, Gaston K; Mulder, Nicola JThe current increase in Gene Ontology (GO) annotations of proteins in the existing genome databases and their use in different analyses have fostered the improvement of several biomedical and biological applications. To integrate this functional data into different analyses, several protein functional similarity measures based on GO term information content (IC) have been proposed and evaluated, especially in the context of annotation-based measures. In the case of topology-based measures, each approach was set with a specific functional similarity measure depending on its conception and applications for which it was designed. However, it is not clear whether a specific functional similarity measure associated with a given approach is the most appropriate, given a biological data set or an application, i.e., achieving the best performance compared to other functional similarity measures for the biological application under consideration. We show that, in general, a specific functional similarity measure often used with a given term IC or term semantic similarity approach is not always the best for different biological data and applications. We have conducted a performance evaluation of a number of different functional similarity measures using different types of biological data in order to infer the best functional similarity measure for each different term IC and semantic similarity approach. The comparisons of different protein functional similarity measures should help researchers choose the most appropriate measure for the biological application under consideration.
- ItemOpen AccessIntegration of multi-omic data and neuroimaging characteristics in studying brain related diseases(2020) Elsheikh, Samar Salah Mohamedahmed; Mulder, Nicola J; Crimi, Alessandro; Chimusa, Emile RApproaches to the identification of genetic variants associated with complex brain diseases have evolved in recent decades. This evolution was supported by advancements in medical imaging and genotyping technologies that result in rich data production in the field of imaging genetics and radiogenomics. Studies in these fields have taken different designs and directions from genomewide associations to studying the complex interplay between genetics and structural connectivity of a wide range of brain-related diseases. Nevertheless, such combinations of heterogeneous, high dimensional and inter-related data has introduced new challenges which cannot be handled with traditional statistical methods. In this thesis, we proposed analysis pipelines and methodologies to study the causal relationship between neuroimaging features, including tumour characteristics and connectomics, genetics and clinical factors in brain-related diseases. In doing so, we adopted two longitudinal study designs and modelled the association between Alzheimer's disease progression and genetic factors, utilising local and global brain connectivity networks. In addition to that, we performed a multi-stage radiogenomic analysis in glioblastoma using non-parametric statistical methods. To address some limitations in the methods, we adopted the Structural Equation Model and developed a mathematical model to examine the inter-correlation between neuroimaging and multi-omic characteristics of brain-related diseases. Our findings have successfully identified risk genes that were previously reported in the literature of Alzheimer's and glioblastoma diseases, and discovered potential risk variants which associate with disease progression. More specifically, we found some loci in the genes CDH18, ANTXR2 and IGF1, located in Chromosomes 5, 4 and 12, to have effect on the brain connectivity over time in Alzheimer's disease. We also found that the expression of APP, HFE, PLAU and BLMH have significant effects on the structural connectivity of local areas in the brain, these are the left Heschl gyrus, right anterior cingulate gyrus, left fusiform gyrus and left Heschl gyrus, respectively. These potential association patterns could be useful for early disease diagnosis, treatment and neurodegeneration prediction. More importantly, we identified gaps in the imaging genetics methodologies, we proposed a mathematical model accounting for these limitations and evaluated the model which produced promising results. Our proposed flexible model, BiGen, addresses the gaps in the existing tools by combining neuroimaging, genetics, environmental, and phenotype information to a single complex analysis, accounting for the heterogeneity, inter-correlation, and non-linearity of the variables. Moreover, BiGen adopts an important assumption which is hardly met in the literature of imaging genetics, and that is, all the four variables are assumed to be latent constructs, that means they can not be observed directly from the data, and are measured through observed indicators. This is an important assumption in both neuroimaging, behavioural and genetic studies, and it is one of the reasons why BiGen is flexible and can easily be extended to include more indicators and latent constructs in the context of brain-related diseases.
- ItemOpen AccessInvestigating local ancestry inference models in mixed ancestry individual genomes(2022) Geza, Ephifania; Mazandu, Gaston K; Chimusa, Emile R; Mulder, Nicola JOwing to historical events including the slave trade, agricultural interests, colonialism, and political and/or economical instability, most modern humans are a mosaic of segments originating from different populations. They result from the interbreeding of two or more previously isolated populations, leading to admixture. Known admixed populations include the mixed ancestry of South Africa, Latin Americans and African Americans. Admixed individuals play important roles in understanding population history, disease aetiology, and personal genomics. Accordingly, efforts have been made to understand the genetic composition of such individuals, yielding several models that infer the ancestry of every chromosomal segment in admixed individuals (local ancestry). However, new research questions emerged concerning model statistical and biological parameters, as well as the performance of these models across admixed datasets. This elicited the need for examining existing local ancestry inference models in order to identify and tackle critical issues of these models, which is the main goal of this thesis. We achieve this in four steps, constituting the main contributions of this PhD project: (1) Qualitative assessment of existing models through a systematic review; (2) Building a unified framework integrating existing models for inferring and assessing local ancestry estimates; (3) Quantitative assessment of existing methods within the same framework; and (4) Proposing a model extension to account for natural selection and the origin of modern humans to improve the accuracy of local ancestry estimates. Firstly, we assess models using published results on different datasets and performance measures, to orient modellers and software developers on the future trends in local ancestry inference. Secondly, to address the challenges identified in (1) including model complexity reflected in the distinct inputs each model requires and outputs formats, we design a unified framework, referred to as FRANC, to manipulate tool-specific inputs, deconvolve ancestry and standardise outputs, to ease the inference process and pave the way for model assessment. Thirdly, using FRANC, we assess the performance of eight state-of-the-art models on simulated admixed population datasets involving three and five ancestral populations. LAMP-LD and LOTER performed better than the other six tested models on admixed populations involving five ancestral populations while RFMIX, WINPOP, ELAI and LAMP-LD were comparable in admixed datasets involving three populations. Performance was evaluated based on performance measures borrowed from the machine learning confusion matrix. Finally, we noted that it may be more practical to extend existing models to incorporate more realistic biological assumptions. Hence, we propose a nonparametric hidden Markov model, that adjusts an existing model mSPECTRUM to account for natural selection and state-persistence when deconvolving local ancestry, which should improve the accuracy of estimates. Similarly to mSPECTRUM, this acknowledges the two common hypotheses on the origin of modern humans, making it comparable to mSPECTRUM which has been shown to be competitive with HAPMIX, a benchmark for two-way admixtures. Therefore, these four are a good contribution to admixture analysis of populations.
- ItemOpen AccessPredicting and analyzing interactions between Mycobacterium tuberculosis and its human host(Public Library of Science, 2013) Rapanoel, Holifidy A; Mazandu, Gaston K; Mulder, Nicola JThe outcome of infection by Mycobacterium tuberculosis (Mtb) depends greatly on how the host responds to the bacteria and how the bacteria manipulates the host, which is facilitated by protein-protein interactions. Thus, to understand this process, there is a need for elucidating protein interactions between human and Mtb, which may enable us to characterize specific molecular mechanisms allowing the bacteria to persist and survive under different environmental conditions. In this work, we used the interologs method based on experimentally verified intra-species and inter-species interactions to predict human-Mtb functional interactions. These interactions were further filtered using known human-Mtb interactions and genes that are differentially expressed during infection, producing 190 interactions. Further analysis of the subcellular location of proteins involved in these human-Mtb interactions confirms feasibility of these interactions. We also conducted functional analysis of human and Mtb proteins involved in these interactions, checking whether these proteins play a role in infection and/or disease, and enriching Mtb proteins in a previously predicted list of drug targets. We found that the biological processes of the human interacting proteins suggested their involvement in apoptosis and production of nitric oxide, whereas those of the Mtb interacting proteins were relevant to the intracellular environment of Mtb in the host. Mapping these proteins onto KEGG pathways highlighted proteins belonging to the tuberculosis pathway and also suggested that Mtb proteins might use the host to acquire nutrients, which is in agreement with the intracellular lifestyle of Mtb. This indicates that these interactions can shed light on the interplay between Mtb and its human host and thus, contribute to the process of designing novel drugs with new biological mechanisms of action.
- ItemOpen AccessA quick guide for building a successful bioinformatics community(Public Library of Science, 2015) Budd, Aidan; Corpas, Manuel; Brazas, Michelle D; Fuller, Jonathan C; Goecks, Jeremy; Mulder, Nicola J; Michaut, Magali; Ouellette, B F Francis; Pawlik, Aleksandra; Blomberg, Niklas"Scientific community" refers to a group of people collaborating together on scientific-research-related activities who also share common goals, interests, and values. Such communities play a key role in many bioinformatics activities. Communities may be linked to a specific location or institute, or involve people working at many different institutions and locations. Education and training is typically an important component of these communities, providing a valuable context in which to develop skills and expertise, while also strengthening links and relationships within the community. Scientific communities facilitate: (i) the exchange and development of ideas and expertise; (ii) career development; (iii) coordinated funding activities; (iv) interactions and engagement with professionals from other fields; and (v) other activities beneficial to individual participants, communities, and the scientific field as a whole. It is thus beneficial at many different levels to understand the general features of successful, high-impact bioinformatics communities; how individual participants can contribute to the success of these communities; and the role of education and training within these communities. We present here a quick guide to building and maintaining a successful, high-impact bioinformatics community, along with an overview of the general benefits of participating in such communities. This article grew out of contributions made by organizers, presenters, panelists, and other participants of the ISMB/ECCB 2013 workshop "The 'How To Guide' for Establishing a Successful Bioinformatics Network" at the 21st Annual International Conference on Intelligent Systems for Molecular Biology (ISMB) and the 12th European Conference on Computational Biology (ECCB).
- ItemOpen AccessScoring protein relationships in functional interaction networks predicted from sequence data(Public Library of Science, 2011) Mazandu, Gaston K; Mulder, Nicola JThe abundance of diverse biological data from various sources constitutes a rich source of knowledge, which has the power to advance our understanding of organisms. This requires computational methods in order to integrate and exploit these data effectively and elucidate local and genome wide functional connections between protein pairs, thus enabling functional inferences for uncharacterized proteins. These biological data are primarily in the form of sequences, which determine functions, although functional properties of a protein can often be predicted from just the domains it contains. Thus, protein sequences and domains can be used to predict protein pair-wise functional relationships, and thus contribute to the function prediction process of uncharacterized proteins in order to ensure that knowledge is gained from sequencing efforts. In this work, we introduce information-theoretic based approaches to score protein-protein functional interaction pairs predicted from protein sequence similarity and conserved protein signature matches. The proposed schemes are effective for data-driven scoring of connections between protein pairs. We applied these schemes to the Mycobacterium tuberculosis proteome to produce a homology-based functional network of the organism with a high confidence and coverage. We use the network for predicting functions of uncharacterised proteins. Availability Protein pair-wise functional relationship scores for Mycobacterium tuberculosis strain CDC1551 sequence data and python scripts to compute these scores are available at http://web.cbio.uct.ac.za/~gmazandu/scoringschemes .
- ItemOpen AccessThe H3ABioNet helpdesk: an online bioinformatics resource, enhancing Africa’s capacity for genomics research(2019-12-30) Kumuthini, Judit; Zass, Lyndon; Panji, Sumir; Salifu, Samson P; Kayondo, Jonathan K; Nembaware, Victoria; Mbiyavanga, Mamana; Olabode, Ajayi; Kishk, Ali; Wells, Gordon; Mulder, Nicola JAbstract Background Currently, formal mechanisms for bioinformatics support are limited. The H3Africa Bioinformatics Network has implemented a public and freely available Helpdesk (HD), which provides generic bioinformatics support to researchers through an online ticketing platform. The following article reports on the H3ABioNet HD (H3A-HD)‘s development, outlining its design, management, usage and evaluation framework, as well as the lessons learned through implementation. Results The H3A-HD evaluated using automatically generated usage logs, user feedback and qualitative ticket evaluation. Evaluation revealed that communication methods, ticketing strategies and the technical platforms used are some of the primary factors which may influence the effectivity of HD. Conclusion To continuously improve the H3A-HD services, the resource should be regularly monitored and evaluated. The H3A-HD design, implementation and evaluation framework could be easily adapted for use by interested stakeholders within the Bioinformatics community and beyond.
- ItemOpen AccessA web-based protein interaction network visualizer(BioMed Central, 2014-05-06) Salazar, Gustavo A; Meintjes, Ayton; Mazandu, Gaston K; Rapanoël, Holifidy A; Akinola, Richard O; Mulder, Nicola JAbstract Background Interaction between proteins is one of the most important mechanisms in the execution of cellular functions. The study of these interactions has provided insight into the functioning of an organism’s processes. As of October 2013, Homo sapiens had over 170000 Protein-Protein interactions (PPI) registered in the Interologous Interaction Database, which is only one of the many public resources where protein interactions can be accessed. These numbers exemplify the volume of data that research on the topic has generated. Visualization of large data sets is a well known strategy to make sense of information, and protein interaction data is no exception. There are several tools that allow the exploration of this data, providing different methods to visualize protein network interactions. However, there is still no native web tool that allows this data to be explored interactively online. Results Given the advances that web technologies have made recently it is time to bring these interactive views to the web to provide an easily accessible forum to visualize PPI. We have created a Web-based Protein Interaction Network Visualizer: PINV, an open source, native web application that facilitates the visualization of protein interactions ( http://biosual.cbio.uct.ac.za/pinv.html ). We developed PINV as a set of components that follow the protocol defined in BioJS and use the D3 library to create the graphic layouts. We demonstrate the use of PINV with multi-organism interaction networks for a predicted target from Mycobacterium tuberculosis, its interacting partners and its orthologs. Conclusions The resultant tool provides an attractive view of complex, fully interactive networks with components that allow the querying, filtering and manipulation of the visible subset. Moreover, as a web resource, PINV simplifies sharing and publishing, activities which are vital in today’s research collaborative environments. The source code is freely available for download at https://github.com/4ndr01d3/biosual .