Browsing by Subject "Sequence databases"
Now showing 1 - 7 of 7
Results Per Page
Sort Options
- ItemOpen AccessThe ancient evolutionary history of polyomaviruses(Public Library of Science, 2016) Buck, Christopher B; Van Doorslaer, Koenraad; Peretti, Alberto; Geoghegan, Eileen M; Tisza, Michael J; An, Ping; Katz, Joshua P; Pipas, James M; McBride, Alison A; Camus, Alvin C; McDermott, Alexa J; Dill, Jennifer A; Delwart, Eric; Ng, Terry F F; Farkas, Kata; Austin, Charlotte; Kraberger, Simona; Davison, William; Pastrana, Diana V; Varsani, ArvindAuthor Summary: Polyomaviruses are a family of DNA-based viruses that are known to infect various terrestrial vertebrates, including humans. In this report, we describe our discovery of highly divergent polyomaviruses associated with various marine fish. Searches of public deep sequencing databases unexpectedly revealed the existence of polyomavirus-like sequences in scorpion and spider datasets. Our analysis of these new sequences suggests that polyomaviruses have slowly co-evolved with individual host animal lineages through an established mechanism known as intrahost divergence. The proposed model is similar to the mechanisms through with other DNA viruses, such as papillomaviruses, are thought to have evolved. Our analysis also suggests that distantly related polyomaviruses sometimes recombine to produce new chimeric lineages. We propose a possible taxonomic scheme that can account for these inferred ancient recombination events.
- ItemOpen AccessComparison of a real-time multiplex PCR and sequetyping assay for pneumococcal serotyping(Public Library of Science, 2015) Dube, Felix S; van Mens, Suzan P; Robberts, Lourens; Wolter, Nicole; Nicol, Paul; Mafofo, Joseph; Africa, Samantha; Zar, Heather J; Nicol, Mark PBACKGROUND: Pneumococcal serotype identification is essential to monitor pneumococcal vaccine effectiveness and serotype replacement. Serotyping by conventional serological methods are costly, labour-intensive, and require significant technical expertise. We compared two different molecular methods to serotype pneumococci isolated from the nasopharynx of South African infants participating in a birth cohort study, the Drakenstein Child Health Study, in an area with high 13-valent pneumococcal conjugate vaccine (PCV13) coverage. METHODS: A real-time multiplex PCR (rmPCR) assay detecting 21 different serotypes/-groups and a sequetyping assay, based on the sequence of the wzh gene within the pneumococcal capsular locus, were compared. Forty pneumococcal control isolates, with serotypes determined by the Quellung reaction, were tested. In addition, 135 pneumococcal isolates obtained from the nasopharynx of healthy children were tested by both serotyping assays and confirmed by Quellung testing. Discordant results were further investigated by whole genome sequencing of four isolates. RESULTS: Of the 40 control isolates tested, 25 had a serotype covered by the rmPCR assay. These were all correctly serotyped/-grouped. Sequetyping PCR failed in 7/40 (18%) isolates. For the remaining isolates, sequetyping assigned the correct serotype/-group to 29/33 (88%) control isolates. Of the 132/135 (98%) nasopharyngeal pneumococcal isolates that could be typed, 69/132 (52%) and 112/132 (85%) were assigned the correct serotype/-group by rmPCR and sequetyping respectively. The serotypes of 63/132 (48%) isolates were not included in the rmPCR panel. All except three isolates (serotype 25A and 38) were theoretically amplified and differentiated into the correct serotype/-group with some strains giving ambigous results (serotype 13/20, 17F/33C, and 11A/D/1818F). Of the pneumococcal serotypes detected in this study, 69/91 (76%) were not included in the current PCV13. The most frequently identified serotypes were 11A, 13, 15B/15C, 16F and 10A. CONCLUSION: The rmPCR assay performed well for the 21 serotypes/-groups included in the assay. However, in our study setting, a large proportion of serotypes were not detected by rmPCR. The sequetyping assay performed well, but did misassign specific serotypes. It may be useful for regions where vaccine serotypes are less common, however confirmatory testing is advisable.
- ItemOpen AccessGenetic variation in TLR genes in Ugandan and South African populations and comparison with HapMap data(Public Library of Science, 2012) Baker, Allison R; Qiu, Feiyou; Randhawa, April Kaur; Horne, David J; Adams, Mark D; Shey, Muki; Barnholtz-Sloan, Jill; Mayanja-Kizza, Harriet; Kaplan, Gilla; Hanekom, Willem A; Boom, W Henry; Hawn, Thomas R; Stein, Catherine MGenetic epidemiological studies of complex diseases often rely on data from the International HapMap Consortium for identification of single nucleotide polymorphisms (SNPs), particularly those that tag haplotypes. However, little is known about the relevance of the African populations used to collect HapMap data for study populations conducted elsewhere in Africa. Toll-like receptor (TLR) genes play a key role in susceptibility to various infectious diseases, including tuberculosis. We conducted full-exon sequencing in samples obtained from Uganda (n = 48) and South Africa (n = 48), in four genes in the TLR pathway: TLR2, TLR4, TLR6, and TIRAP. We identified one novel TIRAP SNP (with minor allele frequency [MAF] 3.2%) and a novel TLR6 SNP (MAF 8%) in the Ugandan population, and a TLR6 SNP that is unique to the South African population (MAF 14%). These SNPs were also not present in the 1000 Genomes data. Genotype and haplotype frequencies and linkage disequilibrium patterns in Uganda and South Africa were similar to African populations in the HapMap datasets. Multidimensional scaling analysis of polymorphisms in all four genes suggested broad overlap of all of the examined African populations. Based on these data, we propose that there is enough similarity among African populations represented in the HapMap database to justify initial SNP selection for genetic epidemiological studies in Uganda and South Africa. We also discovered three novel polymorphisms that appear to be population-specific and would only be detected by sequencing efforts.
- ItemOpen AccessMyDas, an extensible Java DAS server(Public Library of Science, 2012) Salazar, Gustavo A; García, Leyla J; Jones, Philip; Jimenez, Rafael C; Quinn, Antony F; Jenkinson, Andrew M; Mulder, Nicola; Martin, Maria; Hunter, Sarah; Hermjakob, HenningA large number of diverse, complex, and distributed data resources are currently available in the Bioinformatics domain. The pace of discovery and the diversity of information means that centralised reference databases like UniProt and Ensembl cannot integrate all potentially relevant information sources. From a user perspective however, centralised access to all relevant information concerning a specific query is essential. The Distributed Annotation System (DAS) defines a communication protocol to exchange annotations on genomic and protein sequences; this standardisation enables clients to retrieve data from a myriad of sources, thus offering centralised access to end-users. We introduce MyDas, a web server that facilitates the publishing of biological annotations according to the DAS specification. It deals with the common functionality requirements of making data available, while also providing an extension mechanism in order to implement the specifics of data store interaction. MyDas allows the user to define where the required information is located along with its structure, and is then responsible for the communication protocol details.
- ItemOpen AccessNon-Negative Matrix Factorization for Learning Alignment-Specific Models of Protein Evolution(Public Library of Science, 2011) Murrell, Ben; Weighill, Thomas; Buys, Jan; Ketteringham, Robert; Moola, Sasha; Benade, Gerdus; Buisson, Lise du; Kaliski, Daniel; Hands, Tristan; Scheffler, KonradModels of protein evolution currently come in two flavors: generalist and specialist. Generalist models (e.g. PAM, JTT, WAG) adopt a one-size-fits-all approach, where a single model is estimated from a number of different protein alignments. Specialist models (e.g. mtREV, rtREV, HIVbetween) can be estimated when a large quantity of data are available for a single organism or gene, and are intended for use on that organism or gene only. Unsurprisingly, specialist models outperform generalist models, but in most instances there simply are not enough data available to estimate them. We propose a method for estimating alignment-specific models of protein evolution in which the complexity of the model is adapted to suit the richness of the data. Our method uses non-negative matrix factorization (NNMF) to learn a set of basis matrices from a general dataset containing a large number of alignments of different proteins, thus capturing the dimensions of important variation. It then learns a set of weights that are specific to the organism or gene of interest and for which only a smaller dataset is available. Thus the alignment-specific model is obtained as a weighted sum of the basis matrices. Having been constrained to vary along only as many dimensions as the data justify, the model has far fewer parameters than would be required to estimate a specialist model. We show that our NNMF procedure produces models that outperform existing methods on all but one of 50 test alignments. The basis matrices we obtain confirm the expectation that amino acid properties tend to be conserved, and allow us to quantify, on specific alignments, how the strength of conservation varies across different properties. We also apply our new models to phylogeny inference and show that the resulting phylogenies are different from, and have improved likelihood over, those inferred under standard models.
- ItemOpen AccessA novel diagnostic target in the hepatitis C virus genome(Public Library of Science, 2009) Drexler, Jan Felix; Kupfer, Bernd; Petersen, Nadine; Grotto, Rejane Maria Tommasini; Rodrigues, Silvia Maria Corvino; Grywna, Klaus; Panning, Marcus; Annan, Augustina; Silva, Giovanni Faria; Douglas, JillChristian Drosten and colleagues develop, validate, and make openly available a prototype hepatitis C virus assay based on the conserved 3' X-tail element, with potential for clinical use in developing countries.
- ItemOpen AccessScoring protein relationships in functional interaction networks predicted from sequence data(Public Library of Science, 2011) Mazandu, Gaston K; Mulder, Nicola JThe abundance of diverse biological data from various sources constitutes a rich source of knowledge, which has the power to advance our understanding of organisms. This requires computational methods in order to integrate and exploit these data effectively and elucidate local and genome wide functional connections between protein pairs, thus enabling functional inferences for uncharacterized proteins. These biological data are primarily in the form of sequences, which determine functions, although functional properties of a protein can often be predicted from just the domains it contains. Thus, protein sequences and domains can be used to predict protein pair-wise functional relationships, and thus contribute to the function prediction process of uncharacterized proteins in order to ensure that knowledge is gained from sequencing efforts. In this work, we introduce information-theoretic based approaches to score protein-protein functional interaction pairs predicted from protein sequence similarity and conserved protein signature matches. The proposed schemes are effective for data-driven scoring of connections between protein pairs. We applied these schemes to the Mycobacterium tuberculosis proteome to produce a homology-based functional network of the organism with a high confidence and coverage. We use the network for predicting functions of uncharacterised proteins. Availability Protein pair-wise functional relationship scores for Mycobacterium tuberculosis strain CDC1551 sequence data and python scripts to compute these scores are available at http://web.cbio.uct.ac.za/~gmazandu/scoringschemes .