High Degree of HIV-1 group M Genetic Diversity within Circulating Recombinant Forms: Insight into the Early Events of HIV-1M Evolution

The existence of various highly divergent HIV-1 lineages and of recombination-derived sequence tracts of indeterminate origin within established circulating recombinant forms (CRFs) strongly suggests that HIV-1 group M (HIV-1M) diversity is not fully represented under the current classification system. Here we used a fully exploratory screen for recombination on a set of 480 near-full-length genomes representing the full known diversity of HIV-1M. We decomposed recombinant sequences into their constituent parts and then used maximum-likelihood phylogenetic analyses of this mostly recombination-free data set to identify rare divergent sequence lineages that fall outside the major named HIV-1M taxonomic groupings. We found that many of the sequence fragments occurring within CRFs (including CRF04_cpx, CRF06_cpx, CRF11_cpx, CRF18_cpx, CRF25_cpx, CRF27_cpx, and CRF49_cpx) are in fact likely derived from divergent unclassified parental lineages that may predate the current subtypes, even though they are presently identified as derived from currently defined HIV-1M subtypes. Our evidence suggests that some of these CRFs are descended predominantly from what were or are major previously unidentified HIV-1M lineages that were likely epidemiologically relevant during the early stages of the HIV-1M epidemic. The restriction of these divergent lineages to the Congo basin suggests that they were less infectious and/or simply not present at the time and place of the initial migratory wave that triggered the global epidemic. IMPORTANCE HIV-1 group M (HIV-1M) likely spread to the rest of the world from the Congo basin in the mid-1900s (N. R. Faria et al., Science 346:56-61, 2014, http://dx.doi.org/10.1126/science.1256739) and is today the principal cause of the AIDS pandemic. Here, we show that large sequence fragments from several HIV-1M circulating recombinant forms (CRFs) are derived from divergent parental lineages that cannot reasonably be classified within the nine established HIV-1M subtypes. These lineages are likely to have been epidemiologically relevant in the Congo basin at the onset of the epidemic. Nonetheless, they appear not to have undergone the same explosive global spread as other HIV-1M subtypes, perhaps because they were less transmissible. Concerted efforts to characterize more of these divergent lineages could allow the accurate inference and chemical synthesis of epidemiologically key ancestral HIV-1M variants so as to directly test competing hypotheses relating to the viral genetic factors that enabled the present pandemic.