Browsing by Author "Hazelhurst, Scott"
Now showing 1 - 3 of 3
Results Per Page
Sort Options
- ItemOpen AccessDeveloping reproducible bioinformatics analysis workflows for heterogeneous computing environments to support African genomics(BioMed Central, 2018-11-29) Baichoo, Shakuntala; Souilmi, Yassine; Panji, Sumir; Botha, Gerrit; Meintjes, Ayton; Hazelhurst, Scott; Bendou, Hocine; Beste, Eugene d; Mpangase, Phelelani T; Souiai, Oussema; Alghali, Mustafa; Yi, Long; O’Connor, Brian D; Crusoe, Michael; Armstrong, Don; Aron, Shaun; Joubert, Fourie; Ahmed, Azza E; Mbiyavanga, Mamana; Heusden, Peter v; Magosi, Lerato E; Zermeno, Jennie; Mainzer, Liudmila S; Fadlelmola, Faisal M; Jongeneel, C. V; Mulder, NicolaAbstract Background The Pan-African bioinformatics network, H3ABioNet, comprises 27 research institutions in 17 African countries. H3ABioNet is part of the Human Health and Heredity in Africa program (H3Africa), an African-led research consortium funded by the US National Institutes of Health and the UK Wellcome Trust, aimed at using genomics to study and improve the health of Africans. A key role of H3ABioNet is to support H3Africa projects by building bioinformatics infrastructure such as portable and reproducible bioinformatics workflows for use on heterogeneous African computing environments. Processing and analysis of genomic data is an example of a big data application requiring complex interdependent data analysis workflows. Such bioinformatics workflows take the primary and secondary input data through several computationally-intensive processing steps using different software packages, where some of the outputs form inputs for other steps. Implementing scalable, reproducible, portable and easy-to-use workflows is particularly challenging. Results H3ABioNet has built four workflows to support (1) the calling of variants from high-throughput sequencing data; (2) the analysis of microbial populations from 16S rDNA sequence data; (3) genotyping and genome-wide association studies; and (4) single nucleotide polymorphism imputation. A week-long hackathon was organized in August 2016 with participants from six African bioinformatics groups, and US and European collaborators. Two of the workflows are built using the Common Workflow Language framework (CWL) and two using Nextflow. All the workflows are containerized for improved portability and reproducibility using Docker, and are publicly available for use by members of the H3Africa consortium and the international research community. Conclusion The H3ABioNet workflows have been implemented in view of offering ease of use for the end user and high levels of reproducibility and portability, all while following modern state of the art bioinformatics data processing protocols. The H3ABioNet workflows will service the H3Africa consortium projects and are currently in use. All four workflows are also publicly available for research scientists worldwide to use and adapt for their respective needs. The H3ABioNet workflows will help develop bioinformatics capacity and assist genomics research within Africa and serve to increase the scientific output of H3Africa and its Pan-African Bioinformatics Network.
- ItemOpen AccessThe development of computational biology in South Africa: successes achieved and lessons learnt(Public Library of Science, 2016) Mulder, Nicola J; Christoffels, Alan; De Oliveira, Tulio; Gamieldien, Junaid; Hazelhurst, Scott; Joubert, Fourie; Kumuthini, Judit; Pillay, Ché S; Snoep, Jacky L; Bishop, Özlem Tastan; Tiffin, NickiBioinformatics is now a critical skill in many research and commercial environments as biological data are increasing in both size and complexity. South African researchers recognized this need in the mid-1990s and responded by working with the government as well as international bodies to develop initiatives to build bioinformatics capacity in the country. Significant injections of support from these bodies provided a springboard for the establishment of computational biology units at multiple universities throughout the country, which took on teaching, basic research and support roles. Several challenges were encountered, for example with unreliability of funding, lack of skills, and lack of infrastructure. However, the bioinformatics community worked together to overcome these, and South Africa is now arguably the leading country in bioinformatics on the African continent. Here we discuss how the discipline developed in the country, highlighting the challenges, successes, and lessons learnt.
- ItemOpen AccessPopulation-specific common SNPs reflect demographic histories and highlight regions of genomic plasticity with functional relevance(2014-06-06) Choudhury, Ananyo; Hazelhurst, Scott; Meintjes, Ayton; Achinike-Oduaran, Ovokeraye; Aron, Shaun; Gamieldien, Junaid; Jalali Sefid Dashti, Mahjoubeh; Mulder, Nicola; Tiffin, Nicki; Ramsay, MichèleAbstract Background Population differentiation is the result of demographic and evolutionary forces. Whole genome datasets from the 1000 Genomes Project (October 2012) provide an unbiased view of genetic variation across populations from Europe, Asia, Africa and the Americas. Common population-specific SNPs (MAF > 0.05) reflect a deep history and may have important consequences for health and wellbeing. Their interpretation is contextualised by currently available genome data. Results The identification of common population-specific (CPS) variants (SNPs and SSV) is influenced by admixture and the sample size under investigation. Nine of the populations in the 1000 Genomes Project (2 African, 2 Asian (including a merged Chinese group) and 5 European) revealed that the African populations (LWK and YRI), followed by the Japanese (JPT) have the highest number of CPS SNPs, in concordance with their histories and given the populations studied. Using two methods, sliding 50-SNP and 5-kb windows, the CPS SNPs showed distinct clustering across large genome segments and little overlap of clusters between populations. iHS enrichment score and the population branch statistic (PBS) analyses suggest that selective sweeps are unlikely to account for the clustering and population specificity. Of interest is the association of clusters close to recombination hotspots. Functional analysis of genes associated with the CPS SNPs revealed over-representation of genes in pathways associated with neuronal development, including axonal guidance signalling and CREB signalling in neurones. Conclusions Common population-specific SNPs are non-randomly distributed throughout the genome and are significantly associated with recombination hotspots. Since the variant alleles of most CPS SNPs are the derived allele, they likely arose in the specific population after a split from a common ancestor. Their proximity to genes involved in specific pathways, including neuronal development, suggests evolutionary plasticity of selected genomic regions. Contrary to expectation, selective sweeps did not play a large role in the persistence of population-specific variation. This suggests a stochastic process towards population-specific variation which reflects demographic histories and may have some interesting implications for health and susceptibility to disease.