Non-Negative Matrix Factorization for Learning Alignment-Specific Models of Protein Evolution

dc.contributor.authorMurrell, Benen_ZA
dc.contributor.authorWeighill, Thomasen_ZA
dc.contributor.authorBuys, Janen_ZA
dc.contributor.authorKetteringham, Roberten_ZA
dc.contributor.authorMoola, Sashaen_ZA
dc.contributor.authorBenade, Gerdusen_ZA
dc.contributor.authorBuisson, Lise duen_ZA
dc.contributor.authorKaliski, Danielen_ZA
dc.contributor.authorHands, Tristanen_ZA
dc.contributor.authorScheffler, Konraden_ZA
dc.date.accessioned2016-10-31T07:37:57Z
dc.date.available2016-10-31T07:37:57Z
dc.date.issued2011en_ZA
dc.description.abstractModels of protein evolution currently come in two flavors: generalist and specialist. Generalist models (e.g. PAM, JTT, WAG) adopt a one-size-fits-all approach, where a single model is estimated from a number of different protein alignments. Specialist models (e.g. mtREV, rtREV, HIVbetween) can be estimated when a large quantity of data are available for a single organism or gene, and are intended for use on that organism or gene only. Unsurprisingly, specialist models outperform generalist models, but in most instances there simply are not enough data available to estimate them. We propose a method for estimating alignment-specific models of protein evolution in which the complexity of the model is adapted to suit the richness of the data. Our method uses non-negative matrix factorization (NNMF) to learn a set of basis matrices from a general dataset containing a large number of alignments of different proteins, thus capturing the dimensions of important variation. It then learns a set of weights that are specific to the organism or gene of interest and for which only a smaller dataset is available. Thus the alignment-specific model is obtained as a weighted sum of the basis matrices. Having been constrained to vary along only as many dimensions as the data justify, the model has far fewer parameters than would be required to estimate a specialist model. We show that our NNMF procedure produces models that outperform existing methods on all but one of 50 test alignments. The basis matrices we obtain confirm the expectation that amino acid properties tend to be conserved, and allow us to quantify, on specific alignments, how the strength of conservation varies across different properties. We also apply our new models to phylogeny inference and show that the resulting phylogenies are different from, and have improved likelihood over, those inferred under standard models.en_ZA
dc.identifier.apacitationMurrell, B., Weighill, T., Buys, J., Ketteringham, R., Moola, S., Benade, G., ... Scheffler, K. (2011). Non-Negative Matrix Factorization for Learning Alignment-Specific Models of Protein Evolution. <i>PLoS One</i>, http://hdl.handle.net/11427/22350en_ZA
dc.identifier.chicagocitationMurrell, Ben, Thomas Weighill, Jan Buys, Robert Ketteringham, Sasha Moola, Gerdus Benade, Lise du Buisson, Daniel Kaliski, Tristan Hands, and Konrad Scheffler "Non-Negative Matrix Factorization for Learning Alignment-Specific Models of Protein Evolution." <i>PLoS One</i> (2011) http://hdl.handle.net/11427/22350en_ZA
dc.identifier.citationdoi:10.1371/journal.pone.0028898en_ZA
dc.identifier.ris TY - Journal Article AU - Murrell, Ben AU - Weighill, Thomas AU - Buys, Jan AU - Ketteringham, Robert AU - Moola, Sasha AU - Benade, Gerdus AU - Buisson, Lise du AU - Kaliski, Daniel AU - Hands, Tristan AU - Scheffler, Konrad AB - Models of protein evolution currently come in two flavors: generalist and specialist. Generalist models (e.g. PAM, JTT, WAG) adopt a one-size-fits-all approach, where a single model is estimated from a number of different protein alignments. Specialist models (e.g. mtREV, rtREV, HIVbetween) can be estimated when a large quantity of data are available for a single organism or gene, and are intended for use on that organism or gene only. Unsurprisingly, specialist models outperform generalist models, but in most instances there simply are not enough data available to estimate them. We propose a method for estimating alignment-specific models of protein evolution in which the complexity of the model is adapted to suit the richness of the data. Our method uses non-negative matrix factorization (NNMF) to learn a set of basis matrices from a general dataset containing a large number of alignments of different proteins, thus capturing the dimensions of important variation. It then learns a set of weights that are specific to the organism or gene of interest and for which only a smaller dataset is available. Thus the alignment-specific model is obtained as a weighted sum of the basis matrices. Having been constrained to vary along only as many dimensions as the data justify, the model has far fewer parameters than would be required to estimate a specialist model. We show that our NNMF procedure produces models that outperform existing methods on all but one of 50 test alignments. The basis matrices we obtain confirm the expectation that amino acid properties tend to be conserved, and allow us to quantify, on specific alignments, how the strength of conservation varies across different properties. We also apply our new models to phylogeny inference and show that the resulting phylogenies are different from, and have improved likelihood over, those inferred under standard models. DA - 2011 DB - OpenUCT DO - 10.1371/journal.pone.0028898 DP - University of Cape Town J1 - PLoS One LK - https://open.uct.ac.za PB - University of Cape Town PY - 2011 T1 - Non-Negative Matrix Factorization for Learning Alignment-Specific Models of Protein Evolution TI - Non-Negative Matrix Factorization for Learning Alignment-Specific Models of Protein Evolution UR - http://hdl.handle.net/11427/22350 ER - en_ZA
dc.identifier.urihttp://dx.doi.org/10.1371/journal.pone.0028898en_ZA
dc.identifier.urihttp://hdl.handle.net/11427/22350
dc.identifier.vancouvercitationMurrell B, Weighill T, Buys J, Ketteringham R, Moola S, Benade G, et al. Non-Negative Matrix Factorization for Learning Alignment-Specific Models of Protein Evolution. PLoS One. 2011; http://hdl.handle.net/11427/22350.en_ZA
dc.language.isoengen_ZA
dc.publisherPublic Library of Scienceen_ZA
dc.publisher.departmentInstitute of Infectious Disease and Molecular Medicineen_ZA
dc.publisher.facultyFaculty of Health Sciencesen_ZA
dc.publisher.institutionUniversity of Cape Town
dc.rightsThis is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.en_ZA
dc.rights.holder© 2011 Murrell et alen_ZA
dc.rights.urihttp://creativecommons.org/licenses/by/4.0en_ZA
dc.sourcePLoS Oneen_ZA
dc.source.urihttp://journals.plos.org/plosoneen_ZA
dc.subject.otherSequence alignmenten_ZA
dc.subject.otherPhylogeneticsen_ZA
dc.subject.otherAmino acid substitutionen_ZA
dc.subject.otherMolecular evolutionen_ZA
dc.subject.otherSequence databasesen_ZA
dc.subject.otherPhylogenetic analysisen_ZA
dc.subject.otherProtein structure comparisonen_ZA
dc.subject.otherChemical equilibriumen_ZA
dc.titleNon-Negative Matrix Factorization for Learning Alignment-Specific Models of Protein Evolutionen_ZA
dc.typeJournal Articleen_ZA
uct.type.filetypeText
uct.type.filetypeImage
uct.type.publicationResearchen_ZA
uct.type.resourceArticleen_ZA
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Murrell_Non_Negative_Matrix_Factorization_2011.pdf
Size:
576.15 KB
Format:
Adobe Portable Document Format
Description:
Collections