Browsing by Author "Mulder, Nicola"
Now showing 1 - 20 of 42
Results Per Page
Sort Options
- ItemOpen AccessA pan-genome wide association study to identify genes associated with invasive Streptococcus pneumoniae(2023) Iranzadeh, Arash; Mulder, NicolaStreptococcus pneumoniae (pneumococcus) is one of the leading causes of mortality in Africa. It asymptomatically colonizes the human nasopharynx. The invasive pneumococcal disease occurs when isolates spread to normally sterile sites such as lungs, blood, and the central nervous system. Colonization, though, does not necessarily lead to infection. Some isolates remain in the upper respiratory tract only, without causing any pathogenic symptoms. This thesis hypothesized that invasive and non-invasive isolates differ genetically. We tested this hypothesis by applying a pan-genome approach using whole-genome sequencing short reads of 1477 samples from Malawi, including those obtained from the nasopharynx of carriers (825 samples) and from the blood and cerebrospinal fluid of patients (652 samples). In-silico serotyping identified 56 serotypes in the cohort and statistical analysis showed that despite the vaccination, the prevalence of serotypes 1 and 12F increased amongst patients. Genomes were assembled, and a reference pan-genome for all strains was built. Short reads were aligned to the core genome, and core variants were called. The population structure was determined based on the distribution of variants in the pan-genome. Finally, genes with a significant presence in the invasive isolates were identified. Functional enrichment analysis of potential virulence genes was carried out to address how specific genes may contribute to the pathogenesis. The findings highlighted the features of the pneumococcus pan-genome in Malawi. The core- and accessory-genome were characterized based on the functional analysis of genes. The core components included: Ribosomal subunits. Subunits of F-type ATP synthase. Enzymes that catalyze the attachment of amino acids to tRNA molecules, DNA replication, DNA repair, and homologous recombination. 10.13% of the core and soft-core genes were uncharacterized. In the accessory genome, the study detected the presence of genes from Regions of Diversity (RDs), including Subunits of V-type ATPases and Sodium/solute symporter from RD8a. Enzymes from RD3 catalyzing the capsule synthesis. Subunits of PsrP secY2A2 pathogenicity island from RD10. Genes from RD6 and RD7 involved in transposing mobile genetic elements. Genes from RD2 RD8b, and RD12 participating in communication and competition. Genes from RD4 that assemble pilins into pili and anchor pili to the cell wall. 53.58% of accessory genes were uncharacterized. Most serotypes showed a similar prevalence in carriage and disease groups. However, the significant abundance of serotypes 1, 5, and 12F among patients compared to the carriage group suggested they are highly invasive with a short colonization period. These serotypes exhibited a remarkable genetic distinction from others. Their divergence included the absence and presence of several genes in their genome structure. The lack of genes from a genomic island known as RD8a was the most pronounced difference between serotypes 1, 5, and 12F compared to significantly prevalent serotypes in the nasopharynx. Genes in RD8a are involved in binding to epithelial cells and doing aerobics respiration to synthesize ATP through oxidative phosphorylation. The absence of RD8a from serotypes 1, 5, and 12F may be associated with their short duration in the nasopharynx where they need to bind to epithelial cells and access free oxygen molecules required for aerobic respiration. Given this, the amount of ATP is likely to decline in serotypes 1, 5, and 12F, causing them to harbour more phosphotransferase systems to transport carbohydrates since these transporters use phosphoenolpyruvate as the energy source instead of ATP. In conclusion, serotypes 1, 5, and 12F, the most prevalent and invasive pneumococcal strains in Malawi, showed a considerable genetic distinction from other strains that may be associated with their short colonization period and quickness to infect the blood and cerebrospinal fluid.
- ItemOpen AccessAccumulation of splice variants and transcripts in response to PI3K inhibition in T cells(Public Library of Science, 2013) Riedel, Alice; Mofolo, Boitumelo; Avota, Elita; Schneider-Schaulies, Sibylle; Meintjes, Ayton; Mulder, Nicola; Kneitz, SusanneBACKGROUND: Measles virus (MV) causes T cell suppression by interference with phosphatidylinositol-3-kinase (PI3K) activation. We previously found that this interference affected the activity of splice regulatory proteins and a T cell inhibitory protein isoform was produced from an alternatively spliced pre-mRNA. Hypothesis Differentially regulated and alternatively splice variant transcripts accumulating in response to PI3K abrogation in T cells potentially encode proteins involved in T cell silencing. METHODS: To test this hypothesis at the cellular level, we performed a Human Exon 1.0 ST Array on RNAs isolated from T cells stimulated only or stimulated after PI3K inhibition. We developed a simple algorithm based on a splicing index to detect genes that undergo alternative splicing (AS) or are differentially regulated (RG) upon T cell suppression. RESULTS: Applying our algorithm to the data, 9% of the genes were assigned as AS, while only 3% were attributed to RG. Though there are overlaps, AS and RG genes differed with regard to functional regulation, and were found to be enriched in different functional groups. AS genes targeted extracellular matrix (ECM)-receptor interaction and focal adhesion pathways, while RG genes were mainly enriched in cytokine-receptor interaction and Jak-STAT. When combined, AS/RG dependent alterations targeted pathways essential for T cell receptor signaling, cytoskeletal dynamics and cell cycle entry. CONCLUSIONS: PI3K abrogation interferes with key T cell activation processes through both differential expression and alternative splicing, which together actively contribute to T cell suppression.
- ItemOpen AccessAfrican Genomic Medicine Portal: A Web Portal for Biomedical Applications(2022-02-11) Othman, Houcemeddine; Zass, Lyndon; da Rocha, Jorge E B; Radouani, Fouzia; Samtal, Chaimae; Benamri, Ichrak; Kumuthini, Judit; Fakim, Yasmina J; Hamdi, Yosr; Mezzi, Nessrine; Boujemaa, Maroua; Okeke, Chiamaka Jessica; Tendwa, Maureen B; Sanak, Kholoud; Chaouch, Melek; Panji, Sumir; Kefi, Rym; Sallam, Reem M; Ghoorah, Anisah W; Romdhane, Lilia; Kiran, Anmol; Meintjes, Ayton P; Maturure, Perceval; Jmel, Haifa; Ksouri, Ayoub; Azzouzi, Maryame; Farahat, Mohammed A; Ahmed, Samah; Sibira, Rania; Turkson, Michael E E; Ssekagiri, Alfred; Parker, Ziyaad; Fadlelmola, Faisal M; Ghedira, Kais; Mulder, Nicola; Kamal Kassim, SamarGenomics data are currently being produced at unprecedented rates, resulting in increased knowledge discovery and submission to public data repositories. Despite these advances, genomic information on African-ancestry populations remains significantly low compared with European- and Asian-ancestry populations. This information is typically segmented across several different biomedical data repositories, which often lack sufficient fine-grained structure and annotation to account for the diversity of African populations, leading to many challenges related to the retrieval, representation and findability of such information. To overcome these challenges, we developed the African Genomic Medicine Portal (AGMP), a database that contains metadata on genomic medicine studies conducted on African-ancestry populations. The metadata is curated from two public databases related to genomic medicine, PharmGKB and DisGeNET. The metadata retrieved from these source databases were limited to genomic variants that were associated with disease aetiology or treatment in the context of African-ancestry populations. Over 2000 variants relevant to populations of African ancestry were retrieved. Subsequently, domain experts curated and annotated additional information associated with the studies that reported the variants, including geographical origin, ethnolinguistic group, level of association significance and other relevant study information, such as study design and sample size, where available. The AGMP functions as a dedicated resource through which to access African-specific information on genomics as applied to health research, through querying variants, genes, diseases and drugs. The portal and its corresponding technical documentation, implementation code and content are publicly available.
- ItemOpen AccessAn African Genome Variation Database and its applications in human diversity and health(2021) Todt, Davis; Mulder, NicolaAfrican genomes exhibit the highest levels of sequence and haplotype diversity of all extant human populations. A combination of historical as well as geographical factors have contributed toward the high level of genetic diversity in Ancestral populations in Africa. Additionally, a series of concomitant migration events out of Africa, with founder populations harbouring only a subset of this genetic variation, have contributed to the relatively lower genetic diversity observed in non-Africans. Population genetic studies have refined our understanding of human evolutionary history and clinical genomic studies have resulted in improved patient outcomes. However, despite the increased throughput and decreased cost afforded from next-generation sequencing (NGS) and despite the relatively higher genetic variation in Africans, relatively little of the genomic data currently available is representative of diverse African populations. This may result in adverse outcomes in the context of minority populations with little representation in clinical databases. Given the under-representation of African genetic variation and the importance of highlighting and further characterizing it, the objectives of this project were to design, develop and deploy a proof of concept database and web application for the storage, analysis and visualization of African genetic variant data – the African Genome Variation Database (AGVD). The AGVD was developed according to software industry design standards. The project also explored available genomic tools and databases in order to leverage existing software solutions where suitable. Additionally, relevant data sets were identified for use during testing and validation of the pilot phase of the project. To this end, the open access 1000 Genomes Project phase 3 dataset was selected and the genotypes for several chromosomes were loaded into the AGVD. The AGVD leverages the scalable, performant, and open source genomics engine OpenCGA for data storage and analysis. A custom front-end web application was developed by applying a novel approach to render and serve static Vue JS assets from the Python Flask microframework. The web application supports rich data search and filtering operations of loaded variants and allows end-users to visualize annotations of genomic loci and allele change, variant type, associated gene and transcript consequences, clinical significance, and allele frequency information for all annotated cohorts in a highly interactive manner. A bespoke REST API also supports future analytical functionality. The AGVD has demonstrated proof of concept in the secure and scalable storage and visualization of African genomic data, providing a viable solution for H3ABioNet to further extend in future iterations of the project and a valuable resource for researchers to explore African genetic variation.
- ItemOpen AccessAnalysis of within-host evolution of Plasmodium Falciparum during treatment(2018) Okendo, Javan Ochieng; Mulder, Nicola; Andagalu, BenAntimalarial drugs impose strong selective pressure on Plasmodium falciparum parasite genomes and leave signatures of selection. The evolutionary basis of drug resistant malaria in endemic and epidemic settings continues to remain an ongoing scientific priority whose solution carries a significant effect on treatment outcomes. To understand the evolutionary changes in P. falciparum during treatment with ACTs, we used various approaches to test the neutral models of evolution using P. falciparum genomic data which were collected from Kombewa and Maseno in Kisumu, Kenya between 2013 and 2015. The Synonymous/Non-synonymous (dN/dS) ratio was used to predict the effect of selection on protein coding loci of the Pfk13 gene. A logistic regression model was used to test the association between IC50s and the SNPs. mCSM and SDM were used to detect the effects of mutations on the Pfk13 gene while the PRIMO web server was used to locate the SNPs on the Kelch13 propeller domain. Modeller V9.1 was used to predict the structure of the Kelch 13 propeller domain and the Posview webserver used to predict ACT/kelch 13 interactions. Population differentiation was done using Microsatellite analyzer to calculate FST and customized R scripts with the relevant population genetics packages were used in the analysis. For samples collected in 2013, Tajima’s D genomic summary statistic was 4.53194, Fu & Li D* 2.13380, and Fu &Li F* 3.62142. However, in 2015 Tajima’s D was -2.42910, Fu and Li’s D* -5.2712, and Fu and Li’s F* -5.0045. The dN/dS in 2013 was 1.0299, while in 2015 dN/dS was 2.6884. Kenyan P. falciparum SNPs occur on the intra or inter blade domains on the PfK13 propeller domain. The FST analysis showed minimal population differentiation of the parasites during treatment. There was no significant association between SNPs and IC50 values but SNPs at codon D547E showed association with Artesunate and D559E with AQ and MQ IC50 respectively. Even though there is an exponential increase in the number of non-synonymous point mutations in the Pfk13 gene, the Kenyan P. falciparum strains remain sensitive to ACT drugs. Further research needs to be done by deep sequencing this location of chromosome 13 as it will provide more power for finding novel SNPs for further validation.
- ItemOpen AccessApplying, Evaluating and Refining Bioinformatics Core Competencies (An Update from the Curriculum Task Force of ISCB's Education Committee)(Public Library of Science, 2016) Welch, Lonnie; Brooksbank, Cath; Schwartz, Russell; Morgan, Sarah L; Gaeta, Bruno; Kilpatrick, Alastair M; Mietchen, Daniel; Moore, Benjamin L; Mulder, Nicola; Pauley, Mark; Pearson, William; Radivojac, Predrag; Rosenberg, Naomi; Rosenwald, Anne; Rustici, Gabriella; Warnow, Tandy
- ItemOpen AccessA bioinformatic study on the feasibility of a cross-species proteomics analyses of mycobacteria(2013) Rajaonarifara, Elinambinina; Blackburn, Jonathan; Mulder, NicolaIncludes abstract. Includes bibliographical references.
- ItemOpen AccessCharacterisation of the metabolome of Mycobacterium tuberculosis to identify new pathways and pathway holes(2014) Wolfenden, Kristen Marie; Mulder, NicolaDue to high incidence rates and the development of new drug-resistant or multidrug-resistant strains of TB, the development of new medicines and treatments for tuberculosis is a necessity. In order to develop these drugs, Mycobacterium tuberculosis (Mtb) needs to be studied more completely; this study performs a characterisation of the metabolome of Mtb and comparison across the phylogenetic profile to identify notable pathways.
- ItemOpen AccessCreating and analysing an African pan-genome(2022) Bourn, Jessica Jean; Mulder, NicolaThe human reference genome is currently a core resource for understanding the role of genetics in human health, disease, and variation, and has been invaluable in the development of clinical and computational tools for these purposes. However, the limited number of individual genomes used to create the reference has resulted in an underrepresentation of the extensive genetic diversity present in different human populations. Since an important use of the reference genome is to identify genetic variants that may be implicated in disease, this lack of diversity could limit the scientific utility of the reference for ethnic groups that are poorly represented in it. As a result, adaptations to the reference genome structure have been proposed. One such proposal has been the use of multiple reference genomes, each of which represent different human populations. A logical and highly practical method of achieving this is through the use of a pan-genome, which is a curated collection of all the DNA sequences that are found within a population under study. Despite the fact that African populations exhibit the greatest genetic diversity and variation in the world, the many and sometimes ancient ethnolinguistic groups from Africa are among those least represented within the reference genome. Consequently, this study aimed to explore the feasibility of creating and analysing an African pangenome, and to begin developing tools to achieve this. Several distinct African regional ancestral groups – namely east African Nilo-Saharan, east African Afro-Asiatic, far west Niger-Congo, central west Niger-Congo, Bantu-speaking Niger-Congo, central African rainforest hunter-gatherer, and the Khoe and San – have previously been identified, and this study included and analysed samples from each group in order to assemble a more inclusive and representative pan-genome. A software pipeline developed by Duan et al. (2019), termed the HUman Pan-genome ANalysis (HUPAN) pipeline, was used here to assemble the African pan-genome. As the HUPAN pipeline was originally designed to analyse only single populations, the inclusion of multiple populations required modifications and improvements, which were implemented following the testing and analysis of the pipeline using a smaller dataset of whole genome sequences. Subsequently, a final dataset of 168 African high- and medium-coverage whole genome sequences representing the seven separate regional ancestral groups was submitted to the adapted HUPAN pipeline. For each group, nucleotide sequences that were absent from the human reference genome were assembled and extracted, which resulted in the identification of 43.37 Mbp of non-redundant non-reference genomic sequence and 31 novel predicted protein-coding genes from African individuals. Alignment to other pan-genome sequences, whole genomes from different human populations, and the complete telomere-to-telomere human genome validated a large portion of the sequences as nonreference and confirmed that the dataset contained sequences specific to African populations. However, the gene presence-absence variation analysis of the pan-genome within all 168 samples revealed patterns of gene presence and absence that were strongly correlated to the sample dataset of origin, rather than to the ancestral group of origin. This hindered the identification of genuine genetic variation specific to the groups analysed. Further, it appears that previous pan-genomic research has not investigated the degree to which the genetic variation identified is dataset-specific or truly population-specific. Consequently, the failure to acknowledge and account for the effects of spurious inter-dataset variation in previous pan-genomic research indicates that those analyses may be incomplete or ambiguous. This, therefore, calls into question the methods currently used for pangenomic research, and highlights that robust, standardised methods for human pan-genome research must be agreed on to ensure that comprehensive population-specific pan-genomes are produced in the future. Despite this inherent weakness of pan-genomic research, this study successfully enabled the creation and analysis of a comprehensive and inclusive African pan-genome. Unique sets of non-reference sequences specific to African regional ancestral groups were identified and obtained, enabling the assembly of a non-redundant set of pan-African non-reference sequences. Furthermore, certain complex but previously unconsidered aspects of pan-genome research were identified and explored, and these observations may play a role in the advancement of pan-genome research in future.
- ItemOpen AccessDaGO-Fun: tool for Gene Ontology-based functional analysis using term information content measures(BioMed Central Ltd, 2013) Mazandu, Gaston; Mulder, NicolaBACKGROUND: The use of Gene Ontology (GO) data in protein analyses have largely contributed to the improved outcomes of these analyses. Several GO semantic similarity measures have been proposed in recent years and provide tools that allow the integration of biological knowledge embedded in the GO structure into different biological analyses. There is a need for a unified tool that provides the scientific community with the opportunity to explore these different GO similarity measure approaches and their biological applications. RESULTS: We have developed DaGO-Fun, an online tool available at http://web.cbio.uct.ac.za/ITGOM, which incorporates many different GO similarity measures for exploring, analyzing and comparing GO terms and proteins within the context of GO. It uses GO data and UniProt proteins with their GO annotations as provided by the Gene Ontology Annotation (GOA) project to precompute GO term information content (IC), enabling rapid response to user queries. CONCLUSIONS: The DaGO-Fun online tool presents the advantage of integrating all the relevant IC-based GO similarity measures, including topology- and annotation-based approaches to facilitate effective exploration of these measures, thus enabling users to choose the most relevant approach for their application. Furthermore, this tool includes several biological applications related to GO semantic similarity scores, including the retrieval of genes based on their GO annotations, the clustering of functionally related genes within a set, and term enrichment analysis.
- ItemOpen AccessDAS Writeback: A Collaborative Annotation System(BioMed Central Ltd, 2011) Salazar, Gustavo; Jimenez, Rafael; Garcia, Alexander; Hermjakob, Henning; Mulder, Nicola; Blake, EdwinBACKGROUND: Centralised resources such as GenBank and UniProt are perfect examples of the major international efforts that have been made to integrate and share biological information. However, additional data that adds value to these resources needs a simple and rapid route to public access. The Distributed Annotation System (DAS) provides an adequate environment to integrate genomic and proteomic information from multiple sources, making this information accessible to the community. DAS offers a way to distribute and access information but it does not provide domain experts with the mechanisms to participate in the curation process of the available biological entities and their annotations. RESULTS: We designed and developed a Collaborative Annotation System for proteins called DAS Writeback. DAS writeback is a protocol extension of DAS to provide the functionalities of adding, editing and deleting annotations. We implemented this new specification as extensions of both a DAS server and a DAS client. The architecture was designed with the involvement of the DAS community and it was improved after performing usability experiments emulating a real annotation task. CONCLUSIONS: We demonstrate that DAS Writeback is effective, usable and will provide the appropriate environment for the creation and evolution of community protein annotation.
- ItemOpen AccessData integration for the analysis of uncharacterized proteins in Mycobacterium tuberculosis(2010) Mazandu, Gaston Kuzamunu; Mulder, NicolaMycobacterium tuberculosis is a bacterial pathogen that causes tuberculosis, a leading cause of human death worldwide from infectious diseases, especially in Africa. Despite enormous advances achieved in recent years in controlling the disease, tuberculosis remains a public health challenge. The contribution of existing drugs is of immense value, but the deadly synergy of the disease with Human Immunodeficiency Virus (HIV) or Acquired Immunodeficiency Syndrome (AIDS) and the emergence of drug resistant strains are threatening to compromise gains in tuberculosis control. In fact, the development of active tuberculosis is the outcome of the delicate balance between bacterial virulence and host resistance, which constitute two distinct and independent components. Significant progress has been made in understanding the evolution of the bacterial pathogen and its interaction with the host. The end point of these efforts is the identification of virulence factors and drug targets within the bacterium in order to develop new drugs and vaccines for the eradication of the disease.
- ItemOpen AccessDeveloping reproducible bioinformatics analysis workflows for heterogeneous computing environments to support African genomics(BioMed Central, 2018-11-29) Baichoo, Shakuntala; Souilmi, Yassine; Panji, Sumir; Botha, Gerrit; Meintjes, Ayton; Hazelhurst, Scott; Bendou, Hocine; Beste, Eugene d; Mpangase, Phelelani T; Souiai, Oussema; Alghali, Mustafa; Yi, Long; O’Connor, Brian D; Crusoe, Michael; Armstrong, Don; Aron, Shaun; Joubert, Fourie; Ahmed, Azza E; Mbiyavanga, Mamana; Heusden, Peter v; Magosi, Lerato E; Zermeno, Jennie; Mainzer, Liudmila S; Fadlelmola, Faisal M; Jongeneel, C. V; Mulder, NicolaAbstract Background The Pan-African bioinformatics network, H3ABioNet, comprises 27 research institutions in 17 African countries. H3ABioNet is part of the Human Health and Heredity in Africa program (H3Africa), an African-led research consortium funded by the US National Institutes of Health and the UK Wellcome Trust, aimed at using genomics to study and improve the health of Africans. A key role of H3ABioNet is to support H3Africa projects by building bioinformatics infrastructure such as portable and reproducible bioinformatics workflows for use on heterogeneous African computing environments. Processing and analysis of genomic data is an example of a big data application requiring complex interdependent data analysis workflows. Such bioinformatics workflows take the primary and secondary input data through several computationally-intensive processing steps using different software packages, where some of the outputs form inputs for other steps. Implementing scalable, reproducible, portable and easy-to-use workflows is particularly challenging. Results H3ABioNet has built four workflows to support (1) the calling of variants from high-throughput sequencing data; (2) the analysis of microbial populations from 16S rDNA sequence data; (3) genotyping and genome-wide association studies; and (4) single nucleotide polymorphism imputation. A week-long hackathon was organized in August 2016 with participants from six African bioinformatics groups, and US and European collaborators. Two of the workflows are built using the Common Workflow Language framework (CWL) and two using Nextflow. All the workflows are containerized for improved portability and reproducibility using Docker, and are publicly available for use by members of the H3Africa consortium and the international research community. Conclusion The H3ABioNet workflows have been implemented in view of offering ease of use for the end user and high levels of reproducibility and portability, all while following modern state of the art bioinformatics data processing protocols. The H3ABioNet workflows will service the H3Africa consortium projects and are currently in use. All four workflows are also publicly available for research scientists worldwide to use and adapt for their respective needs. The H3ABioNet workflows will help develop bioinformatics capacity and assist genomics research within Africa and serve to increase the scientific output of H3Africa and its Pan-African Bioinformatics Network.
- ItemOpen AccessDevelopment of computational methods for custom protein arrays analysis : a case study on a 100-protein ("CT100") cancer/testis antigen array(2010) Safari Serufuri, Jean-Michel; Blackburn, Jonathan; Mulder, Nicola; Kumuthini, JuditCustom antigen arrays offer a platform to assay the serological response of cancer patients to at set of selected cancer testis antigens in order to infer a diagnosis value or to assess the patient responses to particular treatments. However, the acquisition of the array data is subject to bias and noise. Therefore, array data processing and analysis is required to clear the data from bias, reduce noise and learn from the data. This study aims to address the issues of normalization and sample qualitative clustering for custom protein arrays.
- ItemOpen AccessDisease population genetic simulation framework: towards application in modelling disease risk prediction and heritability rate(2024) Mugo, Jacquiline Wangui; Mulder, Nicola; Chimusa, Emile RugamikaIt has now been over a decade since the first genome-wide association study (GWAS) was conducted. The field has since experienced monumental strides, as evidenced by over half a million associations submitted in the GWAS catalog today, advancements in sequencing technologies, and the development of robust methodologies for genetic data analysis. Our understanding of genetic disease underpinnings has exploded. It is thus unfortunate that a substantial proportion of these studies continue to be of European populations, while GWAS in Africans and other diverse populations lag behind. Concerns about the disparity in genomics studies once these findings are translated into biological functions and clinical interventions have been raised. The field of GWAS also continues to be haunted by the fact that the identified variants still explain a very small proportion of the heritability of several common complex diseases, making their translation challenging. Efforts have been put in place to extend GWAS to diverse populations. Unfortunately, most of the currently developed and commonly used tools in GWAS have been benchmarked and applied to populations of European descent. However, the complex genetic make-up of Africans and other diverse populations implies that different GWAS approaches are imperative. As our world becomes more and more connected through technology, the interbreeding of genetically isolated populations is inevitable and on the rise. Consequently, as GWAS extends to diverse populations, the majority will be mixed-ancestry (admixed) populations that result from such human interbreeding. Association studies of admixed populations have mainly capitalized on admixture mapping or admixture association, which associates the overrepresentation of a given ancestry in cases compared to controls at a given location of the genome with a given trait. However, the application of GWAS to these populations is now being explored, especially as genotyping costs continue to decrease. Research has shown that combining both standard GWAS and admixture association approaches has the potential to improve power in disease scoring statistics. Currently, most joint methods developed are targeted at admixed populations that result from the interbreeding of two genetically isolated populations (2-way admixed populations). The two proposed joint methods for admixed populations with more than two ancestral populations (multi-way admixed populations) are not optimized for such populations. This excludes a larger portion of multi-way admixed populations. We therefore set forth to develop a joint ancestry and SNP association tool tailored to multi-way admixed populations Firstly, we developed a simulation tool that generates realistic homogeneous and/or complex admixed genetic data under various population genetic scenarios that incorporate recombination, mutation, random mating, disease models, admixture, and natural selection, called FractalSIM. The tools require a reference population as input, implement a resampling approach, and retain the frequency of the minor allele and the linkage disequilibrium (LD) patterns of the reference population in the simulation of the resultant dataset. We assessed FractalSIM output using commonly used genetic tools. By employing simulated African, European, and multi-way admixed datasets from FractalSIM, we evaluated commonly used GWAS tools and leveraged the results to discuss an optimized framework for GWAS in diverse populations, such as Africans and admixed populations. By implementing linear mixed models in a full Bayesian context, we developed a joint ancestry and SNP association approach for multi-way admixed populations, JasMAP, that leverages genotypes and ancestry signals to improve GWAS power in these populations. We evaluated the tool using simulated data generated from FractalSIM and benchmarked the output by comparing it with results from other tools. We also applied JasMAP to a South African Coloured (SAC) population, a uniquely 5-way admixed population with a high prevalence of tuberculosis (TB), to identify genetic variants underlying ethnic differences in TB. We have established that JasMAP performs better than other commonly used tools in leveraging genotypes and ancestry risk to improve power in GWAS. In the application of JasMAP to a GWAS of the SAC population, we obtained 13 significant SNPs using the joint association, 12 of which were detected at marginal or substantial thresholds in the genotype-only and ancestry-only associations. By gene-mapping analysis, these SNPs were found near 8 genes, of which 4 were associated with TB based on their functionality, via pathway analysis, and links to social behavior that lead to an increased risk of TB. In particular, one of the significant SNPs on chromosome 4 was linked to SLC7A11 gene, which has previously been linked to TB in a GWAS study of a Chinese population.
- ItemOpen AccessDisruption of maternal gut microbiota during gestation alters offspring microbiota and immunity(BioMed Central, 2018-07-07) Nyangahu, Donald D; Lennard, Katie S; Brown, Bryan P; Darby, Matthew G; Wendoh, Jerome M; Havyarimana, Enock; Smith, Peter; Butcher, James; Stintzi, Alain; Mulder, Nicola; Horsnell, William; Jaspan, Heather BBackground: Early life microbiota is an important determinant of immune and metabolic development and may have lasting consequences. The maternal gut microbiota during pregnancy or breastfeeding is important for defining infant gut microbiota. We hypothesized that maternal gut microbiota during pregnancy and breastfeeding is a critical determinant of infant immunity. To test this, pregnant BALB/c dams were fed vancomycin for 5 days prior to delivery (gestation; Mg), 14 days postpartum during nursing (Mn), or during gestation and nursing (Mgn), or no vancomycin (Mc). We analyzed adaptive immunity and gut microbiota in dams and pups at various times after delivery. Results In addition to direct alterations to maternal gut microbial composition, pup gut microbiota displayed lower α-diversity and distinct community clusters according to timing of maternal vancomycin. Vancomycin was undetectable in maternal and offspring sera, therefore the observed changes in the microbiota of stomach contents (as a proxy for breastmilk) and pup gut signify an indirect mechanism through which maternal intestinal microbiota influences extra-intestinal and neonatal commensal colonization. These effects on microbiota influenced both maternal and offspring immunity. Maternal immunity was altered, as demonstrated by significantly higher levels of both total IgG and IgM in Mgn and Mn breastmilk when compared to Mc. In pups, lymphocyte numbers in the spleens of Pg and Pn were significantly increased compared to Pc. This increase in cellularity was in part attributable to elevated numbers of both CD4+ T cells and B cells, most notable Follicular B cells. Conclusion Our results indicate that perturbations to maternal gut microbiota dictate neonatal adaptive immunity.
- ItemOpen AccessGenetic characteristics of Plasmodium vivax from Northern Mali(2018) Djimde, Moussa; Mulder, Nicola; Djimde, Abdoulaye; Dara, AntoineIntroduction: The surprising presence of P. vivax in West Africa and their ability to infect a Duffy negative population is one more threat to public health. In order to contribute to malaria elimination efforts, there is a need to investigate the origin and characteristics of P. vivax population isolates in Northern Mali. Next Generation Sequence Analysis (NGSA) can help us understand parasite genetic characteristics although low parasite density is a challenge for whole genome sequencing (WGS). In the present work, we investigated if selective whole genome amplification (sWGA) can enrich P. vivax DNA extracted from Rapid Diagnostic Tests (RDTs) for Whole Genome Sequencing. We also investigated the origin and the susceptibility to antimalarial drugs of the strains isolated in Northern Mali. Methods: Parasite DNA was extracted from 267 RDTs using the QIAamp DNA mini kit, then nested PCR and 7 samples were positive for P. vivax. After sWGA, the whole genomes were sequenced using the Illumina platform. Next Generation Sequences Analysis was done followed by population differentiation analyses. Twenty-two additional P. vivax whole genomes from other parts of the World were downloaded from the European Nucleotide Archive for further Neighbour Joining analysis. Results: The sequences extracted from RDTs showed high contamination with human DNA (80%). From the parasite DNA, in total 69529 SNPs were found in the seven P. vivax strains of Northern Mali. The most significant p-values per SNP were carried by the chromosomes 2, 3, 4, 5, 12, 13 and 14. With regard to variant effects, the Transition/Transversion ratio was 1.1. The density of variants with a high effect was 1.62%. There was no mutation associated with antimalarial drugs resistance on pvcrt-o or pvmdr-1 genes. Pairwise differentiation suggests a high degree of relatedness between P. vivax strains isolated in Northern Mali. The NeighboursJoining analysis shows clearly that strains from Mali cluster together and are genetically distinct from those from Mauritania, which shares a border with Mali. The strains isolated in Northern Mali are genetically closer to those from Madagascar, India and Latina America. Conclusion: We did not identify mutations associated to the resistance to antimalarial drugs in pvcrt-o and pvmdr-1 genes. This study confirms that P. vivax strains genetically distinct from those of Mauritania are circulating in Mali. Finally, we conclude that sWGA is a feasible approach for P. vivax DNA enrichment for WGS despite the high proportion of human contamination.
- ItemOpen AccessGenetic dating and pattern of admixture in modern human evolution(2017) Defo, Joel; Mulder, Nicola; Rugamika, Emile ChimusaGenetic variation is shaped by admixture between populations in an evolutionary process. The mixture dynamic between groups of populations results in a mosaic of chromosomal segments inherited from multiple ancestral populations. The distribution of ancestral chromosomal segments and the recombination breakpoints in an admixed genome provide information about the time of admixture. Studying populations with particular ancestries has become a major interest in population genetics because of medical and evolutionary impacts of the patterns of single nucleotide polymorphisms. It provides a better understanding of the impact of population migrations and helps us uncover interactions between several populations. Most of the research on admixed population dating has focused on a single interaction between two populations using various approaches. Some have extended this to mixing of three populations based on assumptions and approaches which differ from one tool to another. However, the inference of distinct ancestral proportions along the genome of an admixed individual and plausible dates of admixture, still remain a challenge in the case of multi-way admixed populations. This dissertation consists of three research initiatives. First, provide a succinct review of current methods for dating the admixture events. We accomplish this by providing a comprehensive review and comparison of current methods pertinent to date admixture event. Second, we assess various admixture dating tools which estimate the time of admixture between two parental populations. We do so by performing various simulations assuming a particular number of generations and use these to evaluate the tools. Third, we apply the top three assessed methods to some admixed populations from the 1000 Genomes project. Despite MALDER shows improvement and produces reasonable date estimates over other current methods, the results from both simulation and real data suggest that dating ancient admixture events accounting for the effect of other admixtures remains a challenge. Our results suggest the need for developing a new approach to date ancient and complex admixture events in multi-way admixed populations.
- ItemOpen AccessGenetic differences in lung adenocarcinoma cells from patients of African and European ancestry(2024) Diseko, Karabo; Mulder, Nicola; Sinkala, MusalulaIn the past two decades, advancements in cancer genetics research have significantly enhanced our molecular comprehension of human cancers. This progress has led to the development of improved clinical tools for the precise diagnosis, prognosis prediction, and tailored treatment of cancers. However, the predominant focus of this research has been on individuals of European ancestry, inadvertently marginalizing the diverse genetic landscapes represented by other ethnic populations. Given minor differences in the genetic makeup across diverse ethnicities, specific cancer genetic variants prevalent in certain ethnic groups may remain overlooked within the current research. Some studies have indeed illuminated nuanced distinctions in the genetic architecture of cancers among patients of varying ethnic backgrounds. Disparities in cancer incidence and outcome between patients of different ethnicities have also been identified. These distinctions stem from a combination of environmental and biological factors, collectively shaping the intricate interplay of cancer genetics and its clinical manifestations. This study endeavours to elucidate clinically significant disparities in lung adenocarcinoma (LUAD) genetics across distinct ethnicities, particularly focusing on African ancestry (AA) and European ancestry (EA) populations. A meticulous comparison of genetic traits within LUAD cells derived from these ethnic groups is conducted to pinpoint genetic variances that hold potential biological relevance. Leveraging data from The Cancer Genome Atlas' lung adenocarcinoma (TCGA-LUAD) study, samples were stratified based on self-reported racial classifications into African ancestry (AA) and European ancestry (EA) groups. Propensity score matching (PSM) was meticulously employed to mitigate disparities in crucial clinical attributes, ensuring a balanced basis for subsequent genetic comparisons. A total of 147 EA and 49 AA samples were extracted following PSM, forming the basis for comprehensive comparisons of gene expression, copy number alterations, and mutation frequencies between the two ethnic cohorts. Key genetic disparities between the two groups were discerned, including 371 significantly differentially expressed (SDE) genes, a higher incidence of copy number alterations in the AA group compared to the EA group, and 101 genes exhibiting varying mutation frequencies between the two groups. An analysis of the biological functions impacted by these genetic variances revealed involvement in critical processes such as cellular response to xenobiotics, hormone metabolism and regulation, mitochondrial energy production, and epithelial-mesenchymal transition. We posit that clinically relevant biological distinctions in LUAD tumours between AA and EA patients stem from differential expression and mutations in genes encoding pivotal proteins such as UDP glucuronosyltransferases and cytochrome P450s, among others. Variations in the sequence and expression of these genes can significantly influence drug response and hallmark cancer cell characteristics, including energy production and epithelial-mesenchymal transition. Despite the limitation of a relatively small sample size, this study illuminates genetic disparities that underpin clinically significant differences in tumour biology between LUAD patients of African and European ancestry.
- ItemOpen AccessGenGraph: a python module for the simple generation and manipulation of genome graphs(2019-10-25) Ambler, Jon M; Mulaudzi, Shandukani; Mulder, NicolaAbstract Background As sequencing technology improves, the concept of a single reference genome is becoming increasingly restricting. In the case of Mycobacterium tuberculosis, one must often choose between using a genome that is closely related to the isolate, or one that is annotated in detail. One promising solution to this problem is through the graph based representation of collections of genomes as a single genome graph. Though there are currently a handful of tools that can create genome graphs and have demonstrated the advantages of this new paradigm, there still exists a need for flexible tools that can be used by researchers to overcome challenges in genomics studies. Results We present GenGraph, a Python toolkit and accompanying modules that use existing multiple sequence alignment tools to create genome graphs. Python is one of the most popular coding languages for the biological sciences, and by providing these tools, GenGraph makes it easier to experiment and develop new tools that utilise genome graphs. The conceptual model used is highly intuitive, and as much as possible the graph structure represents the biological relationship between the genomes. This design means that users will quickly be able to start creating genome graphs and using them in their own projects. We outline the methods used in the generation of the graphs, and give some examples of how the created graphs may be used. GenGraph utilises existing file formats and methods in the generation of these graphs, allowing graphs to be visualised and imported with widely used applications, including Cytoscape, R, and Java Script. Conclusions GenGraph provides a set of tools for generating graph based representations of sets of sequences with a simple conceptual model, written in the widely used coding language Python, and publicly available on Github.