Browsing by Author "Mazandu, Gaston K"
Now showing 1 - 8 of 8
Results Per Page
Sort Options
- ItemOpen AccessA post-gene silencing bioinformatics protocol for plant-defence gene validation and underlying process identification: case study of the Arabidopsis thaliana NPR1(BioMed Central, 2017-11-23) Yocgo, Rosita E; Geza, Ephifania; Chimusa, Emile R; Mazandu, Gaston KBackground: Advances in forward and reverse genetic techniques have enabled the discovery and identification of several plant defence genes based on quantifiable disease phenotypes in mutant populations. Existing models for testing the effect of gene inactivation or genes causing these phenotypes do not take into account eventual uncertainty of these datasets and potential noise inherent in the biological experiment used, which may mask downstream analysis and limit the use of these datasets. Moreover, elucidating biological mechanisms driving the induced disease resistance and influencing these observable disease phenotypes has never been systematically tackled, eliciting the need for an efficient model to characterize completely the gene target under consideration. Results: We developed a post-gene silencing bioinformatics (post-GSB) protocol which accounts for potential biases related to the disease phenotype datasets in assessing the contribution of the gene target to the plant defence response. The post-GSB protocol uses Gene Ontology semantic similarity and pathway dataset to generate enriched process regulatory network based on the functional degeneracy of the plant proteome to help understand the induced plant defence response. We applied this protocol to investigate the effect of the NPR1 gene silencing to changes in Arabidopsis thaliana plants following Pseudomonas syringae pathovar tomato strain DC3000 infection. Results indicated that the presence of a functionally active NPR1 reduced the plant’s susceptibility to the infection, with about 99% of variability in Pseudomonas spore growth between npr1 mutant and wild-type samples. Moreover, the post-GSB protocol has revealed the coordinate action of target-associated genes and pathways through an enriched process regulatory network, summarizing the potential target-based induced disease resistance mechanism. Conclusions: This protocol can improve the characterization of the gene target and, potentially, elucidate induced defence response by more effectively utilizing available phenotype information and plant proteome functional knowledge.
- ItemRestrictedancGWAS: a post genome-wide association study method for interaction, pathway and ancestry analysis in homogeneous and admixed populations(Oxford University Press, 27) Chimusa, Emile R; Mbiyavanga, Mamana; Mazandu, Gaston K; Mulder, Nicola JDespite numerous successful Genome-wide Association Studies (GWAS), detecting variants that have low disease risk still poses a challenge. GWAS may miss disease genes with weak genetic effects or strong epistatic effects due to the single-marker testing approach commonly used. GWAS may thus generate false negative or inconclusive results, suggesting the need for novel methods to combine effects of single nucleotide polymorphisms within a gene to increase the likelihood of fully characterizing the susceptibility gene. Results: We developed ancGWAS, an algebraic graph-based centrality measure that accounts for linkage disequilibrium in identifying significant disease sub-networks by integrating the association signal from GWAS data sets into the human protein–protein interaction (PPI) network. We validated ancGWAS using an association study result from a breast cancer data set and the simulation of interactive disease loci in the simulation of a complex admixed population, as well as pathway-based GWAS simulation. This new approach holds promise for deconvoluting the interactions between genes underlying the pathogenesis of complex diseases. Results obtained yield a novel central breast cancer sub-network of the human interactome implicated in the proteoglycan syndecan-mediated signaling events pathway which is known to play a major role in mesenchymal tumor cell proliferation, thus providing further insights into breast cancer pathogenesis.
- ItemOpen AccessIdentifying genetic variants and pathways associated with extreme levels of fetal hemoglobin in sickle cell disease in Tanzania(2020-06-05) Nkya, Siana; Mwita, Liberata; Mgaya, Josephine; Kumburu, Happiness; van Zwetselaar, Marco; Menzel, Stephan; Mazandu, Gaston K; Sangeda, Raphael; Chimusa, Emile; Makani, JulieBackground Sickle cell disease (SCD) is a blood disorder caused by a point mutation on the beta globin gene resulting in the synthesis of abnormal hemoglobin. Fetal hemoglobin (HbF) reduces disease severity, but the levels vary from one individual to another. Most research has focused on common genetic variants which differ across populations and hence do not fully account for HbF variation. Methods We investigated rare and common genetic variants that influence HbF levels in 14 SCD patients to elucidate variants and pathways in SCD patients with extreme HbF levels (≥7.7% for high HbF) and (≤2.5% for low HbF) in Tanzania. We performed targeted next generation sequencing (Illumina_Miseq) covering exonic and other significant fetal hemoglobin-associated loci, including BCL11A, MYB, HOXA9, HBB, HBG1, HBG2, CHD4, KLF1, MBD3, ZBTB7A and PGLYRP1. Results Results revealed a range of genetic variants, including bi-allelic and multi-allelic SNPs, frameshift insertions and deletions, some of which have functional importance. Notably, there were significantly more deletions in individuals with high HbF levels (11% vs 0.9%). We identified frameshift deletions in individuals with high HbF levels and frameshift insertions in individuals with low HbF. CHD4 and MBD3 genes, interacting in the same sub-network, were identified to have a significant number of pathogenic or non-synonymous mutations in individuals with low HbF levels, suggesting an important role of epigenetic pathways in the regulation of HbF synthesis. Conclusions This study provides new insights in selecting essential variants and identifying potential biological pathways associated with extreme HbF levels in SCD interrogating multiple genomic variants associated with HbF in SCD.
- ItemOpen AccessInformation content-based gene ontology functional similarity measures: which one to use for a given biological data type?(Public Library of Science, 2014) Mazandu, Gaston K; Mulder, Nicola JThe current increase in Gene Ontology (GO) annotations of proteins in the existing genome databases and their use in different analyses have fostered the improvement of several biomedical and biological applications. To integrate this functional data into different analyses, several protein functional similarity measures based on GO term information content (IC) have been proposed and evaluated, especially in the context of annotation-based measures. In the case of topology-based measures, each approach was set with a specific functional similarity measure depending on its conception and applications for which it was designed. However, it is not clear whether a specific functional similarity measure associated with a given approach is the most appropriate, given a biological data set or an application, i.e., achieving the best performance compared to other functional similarity measures for the biological application under consideration. We show that, in general, a specific functional similarity measure often used with a given term IC or term semantic similarity approach is not always the best for different biological data and applications. We have conducted a performance evaluation of a number of different functional similarity measures using different types of biological data in order to infer the best functional similarity measure for each different term IC and semantic similarity approach. The comparisons of different protein functional similarity measures should help researchers choose the most appropriate measure for the biological application under consideration.
- ItemOpen AccessInvestigating local ancestry inference models in mixed ancestry individual genomes(2022) Geza, Ephifania; Mazandu, Gaston K; Chimusa, Emile R; Mulder, Nicola JOwing to historical events including the slave trade, agricultural interests, colonialism, and political and/or economical instability, most modern humans are a mosaic of segments originating from different populations. They result from the interbreeding of two or more previously isolated populations, leading to admixture. Known admixed populations include the mixed ancestry of South Africa, Latin Americans and African Americans. Admixed individuals play important roles in understanding population history, disease aetiology, and personal genomics. Accordingly, efforts have been made to understand the genetic composition of such individuals, yielding several models that infer the ancestry of every chromosomal segment in admixed individuals (local ancestry). However, new research questions emerged concerning model statistical and biological parameters, as well as the performance of these models across admixed datasets. This elicited the need for examining existing local ancestry inference models in order to identify and tackle critical issues of these models, which is the main goal of this thesis. We achieve this in four steps, constituting the main contributions of this PhD project: (1) Qualitative assessment of existing models through a systematic review; (2) Building a unified framework integrating existing models for inferring and assessing local ancestry estimates; (3) Quantitative assessment of existing methods within the same framework; and (4) Proposing a model extension to account for natural selection and the origin of modern humans to improve the accuracy of local ancestry estimates. Firstly, we assess models using published results on different datasets and performance measures, to orient modellers and software developers on the future trends in local ancestry inference. Secondly, to address the challenges identified in (1) including model complexity reflected in the distinct inputs each model requires and outputs formats, we design a unified framework, referred to as FRANC, to manipulate tool-specific inputs, deconvolve ancestry and standardise outputs, to ease the inference process and pave the way for model assessment. Thirdly, using FRANC, we assess the performance of eight state-of-the-art models on simulated admixed population datasets involving three and five ancestral populations. LAMP-LD and LOTER performed better than the other six tested models on admixed populations involving five ancestral populations while RFMIX, WINPOP, ELAI and LAMP-LD were comparable in admixed datasets involving three populations. Performance was evaluated based on performance measures borrowed from the machine learning confusion matrix. Finally, we noted that it may be more practical to extend existing models to incorporate more realistic biological assumptions. Hence, we propose a nonparametric hidden Markov model, that adjusts an existing model mSPECTRUM to account for natural selection and state-persistence when deconvolving local ancestry, which should improve the accuracy of estimates. Similarly to mSPECTRUM, this acknowledges the two common hypotheses on the origin of modern humans, making it comparable to mSPECTRUM which has been shown to be competitive with HAPMIX, a benchmark for two-way admixtures. Therefore, these four are a good contribution to admixture analysis of populations.
- ItemOpen AccessPredicting and analyzing interactions between Mycobacterium tuberculosis and its human host(Public Library of Science, 2013) Rapanoel, Holifidy A; Mazandu, Gaston K; Mulder, Nicola JThe outcome of infection by Mycobacterium tuberculosis (Mtb) depends greatly on how the host responds to the bacteria and how the bacteria manipulates the host, which is facilitated by protein-protein interactions. Thus, to understand this process, there is a need for elucidating protein interactions between human and Mtb, which may enable us to characterize specific molecular mechanisms allowing the bacteria to persist and survive under different environmental conditions. In this work, we used the interologs method based on experimentally verified intra-species and inter-species interactions to predict human-Mtb functional interactions. These interactions were further filtered using known human-Mtb interactions and genes that are differentially expressed during infection, producing 190 interactions. Further analysis of the subcellular location of proteins involved in these human-Mtb interactions confirms feasibility of these interactions. We also conducted functional analysis of human and Mtb proteins involved in these interactions, checking whether these proteins play a role in infection and/or disease, and enriching Mtb proteins in a previously predicted list of drug targets. We found that the biological processes of the human interacting proteins suggested their involvement in apoptosis and production of nitric oxide, whereas those of the Mtb interacting proteins were relevant to the intracellular environment of Mtb in the host. Mapping these proteins onto KEGG pathways highlighted proteins belonging to the tuberculosis pathway and also suggested that Mtb proteins might use the host to acquire nutrients, which is in agreement with the intracellular lifestyle of Mtb. This indicates that these interactions can shed light on the interplay between Mtb and its human host and thus, contribute to the process of designing novel drugs with new biological mechanisms of action.
- ItemOpen AccessScoring protein relationships in functional interaction networks predicted from sequence data(Public Library of Science, 2011) Mazandu, Gaston K; Mulder, Nicola JThe abundance of diverse biological data from various sources constitutes a rich source of knowledge, which has the power to advance our understanding of organisms. This requires computational methods in order to integrate and exploit these data effectively and elucidate local and genome wide functional connections between protein pairs, thus enabling functional inferences for uncharacterized proteins. These biological data are primarily in the form of sequences, which determine functions, although functional properties of a protein can often be predicted from just the domains it contains. Thus, protein sequences and domains can be used to predict protein pair-wise functional relationships, and thus contribute to the function prediction process of uncharacterized proteins in order to ensure that knowledge is gained from sequencing efforts. In this work, we introduce information-theoretic based approaches to score protein-protein functional interaction pairs predicted from protein sequence similarity and conserved protein signature matches. The proposed schemes are effective for data-driven scoring of connections between protein pairs. We applied these schemes to the Mycobacterium tuberculosis proteome to produce a homology-based functional network of the organism with a high confidence and coverage. We use the network for predicting functions of uncharacterised proteins. Availability Protein pair-wise functional relationship scores for Mycobacterium tuberculosis strain CDC1551 sequence data and python scripts to compute these scores are available at http://web.cbio.uct.ac.za/~gmazandu/scoringschemes .
- ItemOpen AccessA web-based protein interaction network visualizer(BioMed Central, 2014-05-06) Salazar, Gustavo A; Meintjes, Ayton; Mazandu, Gaston K; Rapanoël, Holifidy A; Akinola, Richard O; Mulder, Nicola JAbstract Background Interaction between proteins is one of the most important mechanisms in the execution of cellular functions. The study of these interactions has provided insight into the functioning of an organism’s processes. As of October 2013, Homo sapiens had over 170000 Protein-Protein interactions (PPI) registered in the Interologous Interaction Database, which is only one of the many public resources where protein interactions can be accessed. These numbers exemplify the volume of data that research on the topic has generated. Visualization of large data sets is a well known strategy to make sense of information, and protein interaction data is no exception. There are several tools that allow the exploration of this data, providing different methods to visualize protein network interactions. However, there is still no native web tool that allows this data to be explored interactively online. Results Given the advances that web technologies have made recently it is time to bring these interactive views to the web to provide an easily accessible forum to visualize PPI. We have created a Web-based Protein Interaction Network Visualizer: PINV, an open source, native web application that facilitates the visualization of protein interactions ( http://biosual.cbio.uct.ac.za/pinv.html ). We developed PINV as a set of components that follow the protocol defined in BioJS and use the D3 library to create the graphic layouts. We demonstrate the use of PINV with multi-organism interaction networks for a predicted target from Mycobacterium tuberculosis, its interacting partners and its orthologs. Conclusions The resultant tool provides an attractive view of complex, fully interactive networks with components that allow the querying, filtering and manipulation of the visible subset. Moreover, as a web resource, PINV simplifies sharing and publishing, activities which are vital in today’s research collaborative environments. The source code is freely available for download at https://github.com/4ndr01d3/biosual .