Identification of evolutionarily conserved nucleic acid secondary structures within viral genomes

Master Thesis


Permanent link to this Item
Journal Title
Link to Journal
Journal ISSN
Volume Title

University of Cape Town

Large single-stranded nucleotide sequences such as certain DNA and RNA virus genomes form complex secondary structures through Watson-Crick base-pairing interactions between bases on the same DNA or RNA strand. Both computational and experimental methods are available for determining the secondary structure conformations of nucleotide sequences and have been used to identify many thousands of structural elements within these sequences. However, to date only a small subset of these elements have been shown to have any biological functionality. In this study, a software tool enabling the graphical visualisation of structural elements overlaid with additional biological information was developed in order to assist researchers in identifying elements with likely biological functionality within larger secondary structures. Additionally, a tool was developed for identifying conserved structural elements amongst a set of related sequences. Using these tools a number of conserved elements were identified within computationally predicted secondary structures of HIV and five flavivirus species. Further evidence of selection having acted to preserve the identified elements, in support of their likely biological importance, was provided by evolutionary analysis of related sequences. In addition to the identification of individual structural elements, a large-scale analysis of HIV-1 and seven flavivirus species revealed statistically significant associations between base-paired nucleotide positions and various patterns of mutation. While the results of the various analyses carried out were consistent with the presence of between one and five highly-conserved functionally important secondary structures within the various genomes analysed, they were also consistent with the occurrence of many less-conserved ( or only transiently conserved) structures that potentially have no specific biological functions. Although many of these less-conserved structures may, as has previously been suggested, defend these genomes against intra-cellular anti-viral siRNAs, a strong association was found between sites that are base-paired and codon sites that are apparently evolving under purifying selection: a finding which suggests that an additional or alternative role of these structures may simply be to reduce basal mutation frequencies, and hence amino acid substitution rates at codon sites that encode functionally important amino acids. An investigation of HIV-1 natural inter- and intra-subtype recombinant sequences revealed that recombination breakpoints had a significant tendency to occur more frequently at base-paired nucleotide positions within the computationally predicted secondary structures of these sequences. Furthermore, it was found that these natural HIV-1 recombinants were less disruptive of protein tertiary structure and RNA secondary structure conformations than expected by chance: suggesting that natural selection favouring the maintenance of both protein and nucleic acid folding has likely had a major influence on the patterns of recombination that are detectable within HIV genomes. Finally, a general method for predicting the secondary structures of a set of related sequences sharing similar secondary structure conformations was developed. By taking alignment uncertainty into account through the analysis of large numbers of similarly plausible multiple sequence alignments that are sampled using a statistical alignment procedure, the method was found to have better predictive accuracy than existing related RNA secondary structure prediction methods.