Non-Negative Matrix Factorization for Learning Alignment-Specific Models of Protein Evolution

 

Show simple item record

dc.contributor.author Murrell, Ben en_ZA
dc.contributor.author Weighill, Thomas en_ZA
dc.contributor.author Buys, Jan en_ZA
dc.contributor.author Ketteringham, Robert en_ZA
dc.contributor.author Moola, Sasha en_ZA
dc.contributor.author Benade, Gerdus en_ZA
dc.contributor.author Buisson, Lise du en_ZA
dc.contributor.author Kaliski, Daniel en_ZA
dc.contributor.author Hands, Tristan en_ZA
dc.contributor.author Scheffler, Konrad en_ZA
dc.date.accessioned 2016-10-31T07:37:57Z
dc.date.available 2016-10-31T07:37:57Z
dc.date.issued 2011 en_ZA
dc.identifier.citation doi:10.1371/journal.pone.0028898 en_ZA
dc.identifier.uri http://dx.doi.org/10.1371/journal.pone.0028898 en_ZA
dc.identifier.uri http://hdl.handle.net/11427/22350
dc.description.abstract Models of protein evolution currently come in two flavors: generalist and specialist. Generalist models (e.g. PAM, JTT, WAG) adopt a one-size-fits-all approach, where a single model is estimated from a number of different protein alignments. Specialist models (e.g. mtREV, rtREV, HIVbetween) can be estimated when a large quantity of data are available for a single organism or gene, and are intended for use on that organism or gene only. Unsurprisingly, specialist models outperform generalist models, but in most instances there simply are not enough data available to estimate them. We propose a method for estimating alignment-specific models of protein evolution in which the complexity of the model is adapted to suit the richness of the data. Our method uses non-negative matrix factorization (NNMF) to learn a set of basis matrices from a general dataset containing a large number of alignments of different proteins, thus capturing the dimensions of important variation. It then learns a set of weights that are specific to the organism or gene of interest and for which only a smaller dataset is available. Thus the alignment-specific model is obtained as a weighted sum of the basis matrices. Having been constrained to vary along only as many dimensions as the data justify, the model has far fewer parameters than would be required to estimate a specialist model. We show that our NNMF procedure produces models that outperform existing methods on all but one of 50 test alignments. The basis matrices we obtain confirm the expectation that amino acid properties tend to be conserved, and allow us to quantify, on specific alignments, how the strength of conservation varies across different properties. We also apply our new models to phylogeny inference and show that the resulting phylogenies are different from, and have improved likelihood over, those inferred under standard models. en_ZA
dc.language.iso eng en_ZA
dc.publisher Public Library of Science en_ZA
dc.rights This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. en_ZA
dc.rights.uri http://creativecommons.org/licenses/by/4.0 en_ZA
dc.source PLoS One en_ZA
dc.source.uri http://journals.plos.org/plosone en_ZA
dc.subject.other Sequence alignment en_ZA
dc.subject.other Phylogenetics en_ZA
dc.subject.other Amino acid substitution en_ZA
dc.subject.other Molecular evolution en_ZA
dc.subject.other Sequence databases en_ZA
dc.subject.other Phylogenetic analysis en_ZA
dc.subject.other Protein structure comparison en_ZA
dc.subject.other Chemical equilibrium en_ZA
dc.title Non-Negative Matrix Factorization for Learning Alignment-Specific Models of Protein Evolution en_ZA
dc.type Journal Article en_ZA
dc.rights.holder © 2011 Murrell et al en_ZA
uct.type.publication Research en_ZA
uct.type.resource Article en_ZA
dc.publisher.institution University of Cape Town
dc.publisher.faculty Faculty of Health Sciences en_ZA
dc.publisher.department Institute of Infectious Disease and Molecular Medicine en_ZA
uct.type.filetype Text
uct.type.filetype Image


Files in this item

This item appears in the following Collection(s)

Show simple item record

This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Except where otherwise noted, this item's license is described as This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.