Functional Genome-wide Association Studies (fGWAS) and genomics landscape of signatures of polygenic adaptation in Botswana populations with HIV-1 C infection

Master Thesis


Permanent link to this Item
Journal Title
Link to Journal
Journal ISSN
Volume Title
The burden of the human immunodeficiency virus subtype (HIV) is catastrophic, especially in Botswana, a nation in Southern Africa, where 20.7% of persons aged 15 to 49 years are living with HIV (PWH). Although HIV exposure rates are extremely high, individual differences in clinical outcomes point to the importance of host genetic variables. Genome-wide association studies (GWAS) are methods widely used for identifying single nucleotide polymorphisms (SNPs) linked to different phenotypes such as HIV-1 subtype C (HIV-1C) susceptibility or resistance. Even though, conventional GWAS techniques alone cannot illuminate the associated functional pathways beyond detecting association SNPs. Alternatively, functional genome-wide association studies (fGWAS), which are post-GWAS analyses, can be utilized to put GWAS summary results into functional context. We hereby, first describe the fGWAS employing whole genome sequences (WGS) of Batswana exposed to HIV-1C. Although autosomal SNPs are widely utilized for functional-GWAS research, recent studies have also linked mitochondria DNA (mtDNA) haplotypes to various HIV outcomes. We explored the possibility of using both autosomal and mtDNA SNPs to enrich fGWAS analysis. On another hand, people from Botswana (collectively called Batswana) exhibits a homogenous genetic structure and has also been identified as the closest population to the most recent common ancestor (MRCA) of human species. Concomitantly the population has extremely high HIV1C burden. It remains unclear if increased susceptibility to HIV-1C infection may be associated with its closeness to the ancestral genomes. We investigated the genomics landscape of signatures of polygenic adaptation to the HIV-1C in Botswana using genomewide scans for selection (GWSS). For the first goal, we looked at several HIV-related studies that had been done in Botswana from 1995 to 2020. We saw significant advancements in the fight against the pandemic, including the early introduction of second generation antiretrovirals like dolutegravir (DTG), the first African nation to reach the 2030 UNIAIDS targets (95-95-95), and a 0.3% decrease in mother-to-child transmission rates. However, the main limitations include the spread of HIV 1C that is resistant to treatment and the evolution of medication resistance mutations have presented a serious danger to ending the epidemic. We demonstrated that research on host genetic variables that affect HIV-1C clinical outcomes among Batswana are still understudied. Overall, Botswana has significant advancements in the widespread deployment of antiretroviral treatment (ART) and high viral load suppression rates. For the second goal, we employed WGS to carry out GWAS using autosomal SNPs (chromosomes 1-22). fGWAS was conducted using the ensuing results. There were 394 WGS of the Batswana that were analysed. The research subjects came from the earlier Botswana based Mashi and Tshedimoso studies. A total of 11,364,691 million autosomal, biallelic SNPs were collected and utilised for further studies after variant calling, strict variant filtering, quality checking; and phasing using the tools PLINK V2.00A2LM and EAGLE v2.1, respectively. A total of 226 PWH (all female) and 112 controls (those without HIV infection30 men and 82 females-) were included in the 338 samples that passed quality control. Despite the fact that the data come from a population of Botswana that speaks 25 different languages and that there are well-known linguistic and cultural differences, principal component analysis (PCA) based on autosomal SNPs failed to identify any substructures. Interestingly, we identified 12.2% of the genetic variants that were reported among Batswana but not found in publicly available. Softwares EMMAX and PLINK V2.00A2LM were used concurrently for association analysis, and 137 significant SNPs located on chromosomes 1, 3, 6, and 7 were identified. Nine of the closest genes including –LINC01266, LINC00578, SLC26A8, DNAH11, PLCB1, CRYBB2P1, AL512484.1, SMYD3, LOC105373269–, were effectively mapped to six of the seven lead SNPs at -log10Pvalue≤6. For gene set enrichment analysis, the Gene MANIA online tool was employed, and 20 additional genes with physical co-expression of 81% were identified: – CFTR, RACGAP1, PCF11, DGKQ, HAP1, PRKCA, GNA11, SLC26A10, SLC26A7, SLC26A11, SLC26A9, DCTN1, SPTBN1, DNAH2, SLC26A5, SLC26A1, SLC26A6, SLC26A3, SLC26A4, CDC20–. Both the Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) databases had several terms related to these genes. Enriched biological pathways include enhanced antiporter activity, gastric and pancreatic secretion, solute carrier (SLC)-mediated transmembrane transport, and neurodegenerative processes. Such terms have also been linked to various clinical phenotypes including HIV-1 infection, human cytomegalovirus (CMV), and cancers suggesting possible cross phenotype associations. For the third goal, we used PRSice to calculate polygenic risk scores for individuals; Botswana population was set as target data while summary statistics of Uganda population exposed to HIV was base dataset. We then used the results to determine everyone's risk exposure level based on the best predictive threshold (PT). PT and HIV susceptibility were substantially associated. The best-fit prediction for Botswana was PT =0.129 (β=-120.02, standard error=61.14, pvalue=5.0x10-2) and 1.3% of the variance in HIV-1 susceptibility at the population level could be explained by the scores. In order to evaluate the population recombination history, we also evaluated the linkage disequilibrium (LD) degradation. According to LD decay statistics of r 2, LD decays more quickly in PWH than in those without HIV, suggesting the potential influence of selection pressure acting on HIV infection response. In addition, we evaluated the minor allele frequency (MAF) and generated allele frequency distributions. When comparing cases and controls, derived allele frequency (DAF) revealed a significant difference in the derived allele at low MAF (bin-width = 0-0.05), suggesting that uncommon variations may be involved in HIV-1 susceptibility. Consequently, the postGWAS studies also included the rare-variant analyses that were available. Thirdly, we looked at mtDNA to identify variations linked to HIV-1 susceptibility and their functional pathways. Analyses of phylogenetic relatedness and haplogroups were also conducted. The mtDNA sequences' median coverage was (1060; Q1:738.5 – Q3:1397). The total genotyping rate for mtDNA QC was 0.99, with 560 variations being eliminated because of minor allele criteria of 0.01. We then discovered a total of 351 variation sites and 64 highresolution haplogroups among the 338 people who passed QC. One mtDNA variation, m.2072A>G, was identified by association analysis using a logistic regression model with 10 PCs and gender as covariates. Based on functional predictions, it was assigned to a non-coding transcript exon of the MT-RNR2 gene's 16S ribosomal RNA (rRNA), and it was regarded as having potential pathogenicity. We were able to identify 67 high-resolution haplogroups. L0a1b1a1 was the most prevalent haplogroup (n = 39), but 27 uncommon haplotypes, such as H, R, L0f1, and L0g, were also discovered. Other common haplotypes included L0a1b1a1, L2a1b1a, and L0d1b2b1b, which had prevalence rates of 10%, 8%, and 7%, respectively, while L0d3b1 and L3e1b2 each had a 6%. L3e1b2 and L0a1bla1 (pvalue=0.02), L3d1a1a1 (pvalue=0.03), L3e1a3a, L0d2a1a3, L0d2a1, and L0d1d (pvalue <0.01), totaling 7 (11%) haplogroups, were statistically higher among PWH than HIV-negative persons. When a post-analysis was conducted to see if any of the 7 significant haplogroups were related to vulnerability to HIV infection, no significant connection was discovered. Finally, we investigated the genetic basis of HIV-1 infection susceptibility in Batswana. Integrated Haplotype Scores (iHS) revealed strong selection signals on 32 out of 332 genes putatively under strong selection, including BX664718.2, EMB, ERICH1, GGNBP1, GOSR2, KSR1P1, LINC01016, LINC01667, LINC01691, LINC02798, MAP3K14, MIR1268 2. The most enriched clusters were associated with immune pathways, including cytokine signaling pathways, cytokine regulatory pathways, and phosphorylation, when substantially overrepresented GO terms and KEGG pathways were searched for among the candidate genes. Notably, the majority of the genes were involved in the regulation and activation of NF-kappaB and I-kappaB kinase. In addition to HIV-1 infections, IKBKB has also been linked to CMV, hepatitis virus infections, and tuberculosis (TB). In conclusion, this project presented new insights into the multifaceted demographic history that shaped the existing genetic landscape of the population of Botswana. Although utilizing autosomal chromosomes failed to reveal any population sub-structuring, mtDNA diversity confirms variety and substantial sub-structure in the population of Botswana. Additionally, we identified new potential genes, pathways, and targets that are involved in the regulation of HIV-1 susceptibility Batswana, which predisposes the population to high acquisition than resistance rates seen. The gene sets identified here were also enriched for other features, including as TB, HBV, and CMV, which have been found to be very prevalent in Botswana and among PWH.