Analysis of the impact on phylogenetic inference of non-reversible nucleotide substitution models

Doctoral Thesis


Permanent link to this Item
Journal Title
Link to Journal
Journal ISSN
Volume Title
Most phylogenetic trees are inferred using time-reversible evolutionary models that assume that the relative rates of substitution for any given pair of nucleotides are the same regardless of the direction of the substitutions. However, there is no reason to assume that the underlying biochemical mutational processes that cause substitutions are similarly symmetrical. Here, we evaluate the effect on phylogenetic inference in empirical viral and simulated data of incorporating non-reversibility into models of nucleotide substitution processes. I consider two non-reversible nucleotide substitution models: (1) a 6-rate nonreversible model (NREV6) that is applicable to analyzing mutational processes in double-stranded genomes in that complementary substitutions occur at identical rates; and (2) a 12-rate non-reversible model (NREV12) that is applicable to analyzing mutational processes in single-stranded (ss) genomes in that all substitution types are free to occur at different rates. Using likelihood ratio and Akaike Information Criterion-based model tests, we show that, surprisingly, NREV12 provided a significantly better fit than the General Time Reversible (GTR) and NREV6 models to 21/31 dsRNA and 20/30 dsDNA datasets. As expected, however, NREV12 provided a significantly better fit to 24/33 ssDNA and 40/47 ssRNA datasets. I tested how non-reversibility impacts the accuracy with which phylogenetic trees are inferred. As simulated degrees of non-reversibility (DNR) increased, the tree topology inferences using both NREV12 and GTR became more accurate, whereas inferred tree branch lengths became less accurate. I conclude that while non-reversible models should be helpful in the analysis of mutational processes in most virus species, there is no pressing need to use these models for routine phylogenetic inference. Finally, I introduce a web application, RpNRM, that roots phylogenetic trees using a non-reversible nucleotide substitution model. The phylogenetic tree is rooted on every branch and the likelihoods of each rooting are determined and compared with the highest likelihood tree being identified as that with the most plausible rooting. The rooting accuracy of RpNRM was compared to that of the outgroup rooting method, the midpoint rooting method and another non-reversible model-based rooting method implemented in the program IQTREE. I find that although the RpNRM and IQTREE reversible model-based methods are not as accurate on their own as outgroup or midpoint rooting methods, they nevertheless provide an independent means of verifying the root locations that are inferred by these other methods.