Whole exome sequencing: a customised approach to exploring the genetic basis of musculoskeletal soft tissue injuries

Doctoral Thesis

2018

Permanent link to this Item
Authors
Journal Title
Link to Journal
Journal ISSN
Volume Title
Publisher
Publisher
License
Series
Abstract
Background: Several variants have been associated with the risk of musculoskeletal soft tissue injuries, suggesting a role for genetics in the aetiology of chronic Achilles tendinopathy (AT) and anterior cruciate ligament (ACL) ruptures. Genetic risk modifiers have primarily been identified using a hypothesis driven candidate gene approach. However, the ability to identify all risk-conferring variants using this approach alone is limited. Therefore, the primary aim of this thesis was to further define the molecular signatures of musculoskeletal soft tissue injuries mapping to specific genes. The genes encoding the tenascin-C glycoprotein (TNC, chromosome 9), the α1 chain of type XXVII collagen (COL27A1, chromosome 9), matrix metalloproteinase 3 (MMP3, chromosome 11) and the α1 chain of type I collagen (COL1A1, chromosome 17) were previously associated with the risk of injury and were therefore prioritised for further interrogation. Variants within these regions, which had either been previously associated with injury risk or prioritised from the list of new candidates identified by whole exome sequencing (WES) through the application of a customised tiered filtering strategy, were genotyped in several self-identified white AT and ACL rupture cohorts. The second aim of this study was to determine whether the observed risk-associated variants in the self-identified white cohorts were similar to those underpinning injury in the ancestrally admixed South African Coloured cohort, using ACL ruptures as the phenotypic model. The specific objectives of this thesis were: • To select well-phenotyped participants to be sequenced using an extended whole exome sequencing platform • The development and application of a reusable bioinformatics analyses pipeline involving a customised, tiered filtering strategy to select candidates for interrogation from the list of variants identified by WES. The TNC, COL27A1, MMP3 and COL1A1 genes were prioritised for further interrogation using this approach. • To test the association between the selected candidate variants and the risk of chronic AT and ACL ruptures using a case-control genetic association study design. The candidates selected from the list of variants identified by WES included: TNC rs1061494 (T>C), rs2104772 (T>A) and rs1061495 (T>C) and COL27A1 rs2567706 (A>G), rs2241671 (G>A) and rs2567705 (A>T). In addition, several variants previously associated with the risk of injury including TNC rs1138545 (C>T), MMP3 rs3025058 (5A>6A), rs679620 (G>A), rs591058 (C>T) and rs650108 (G>A) and COL1A1 rs1107946 (G>T) and rs1800012 (G>T), were also prioritised for investigation in additional injury cohorts. • To functionally annotate the prioritised variants using a host of in silico bioinformatic analyses tools. Methods Whole exome sequencing and data processing: Ten asymptomatic controls and ten chronic AT cases were selected for sequencing. Controls were older than 47 years of age, were physically active and had not reported any previous tendon or ligament injuries. Cases were younger than 35 years of age, had suffered chronic, bilateral Achilles tendinopathy and/or reported several Achilles tendon injuries. Paired-end WES was performed on the Illumina HiSeq 2000/2500 platform at 30X coverage. A customised, tiered filtering strategy was developed to screen for candidate variants. All candidate variants were confirmed using Sanger Sequencing and genotyped, together with the other prioritised variants in the larger injury cohorts. Case-control genetic association studies Achilles tendon injury cohorts: Three cohorts were independently recruited from South Africa (SA), Australia (AUS) and the United Kingdom (UK). The South African White (SAW)- Achilles tendon injury cohort consisted of 165 controls (SAW-CONAT), 123 cases with chronic Achilles tendinopathy (SAW-TEN) and 47 cases with acute Achilles tendon ruptures (SAWRUP). The UK-Achilles tendon injury cohort consisted of 130 controls (UK-CON), 87 cases with chronic Achilles tendinopathy (UK-TEN) and 35 cases with acute Achilles tendon rupture (UKRUP). The AUS-Achilles tendon injury cohort included 210 controls (AUS-CON) and 85 cases with chronic Achilles tendinopathy (AUS-TEN). Anterior cruciate ligament rupture cohorts: The first ACL rupture cohort consisted of 232 control participants (SAW-CONACL) and 234 cases with surgically diagnosed ACL ruptures (SAW-ACL), of which 135 were reportedly non-contact in mechanism (SAW-NON). All participants in this group were self-identified to be of South African White ancestry. The participants in the second South African ACL rupture cohort were of mixed ancestry and self-identified as being South African Coloured (SAC). This group consisted of 100 controls (SAC-CON) and 97 participants with surgically diagnosed ACL ruptures (SAC-ACL), of which 50 were reportedly non-contact in mechanism (SAC-NON). The TNC and COL27A1 genomic intervals were explored through the TNC rs1061494, rs1138545, rs2104772 and rs1061495 variants and the COL27A1 rs2567706, rs2241671 and rs2567705 variants in the SAW- and UK-Achilles tendon injury cohorts, in addition to the SAWand SAC-ACL rupture cohorts. The MMP3 locus was explored using the previous riskassociated rs3025058, rs679620, rs591058 and rs650108 variants in the AUS-Achilles tendon injury and SAW-ACL rupture cohorts. Chapter 5 explored the COL1A1 locus using the previous risk-associated COL1A1 rs1107946 and rs1800012 variants in the SAW- and UK-Achilles tendon injury cohorts, in addition to the SAW-ACL rupture and SAC-ACL rupture cohorts. Statistical analyses were performed using the R programming environment, with statistical significance set at PC) variant is predicted to overlap the sequence motifs of the muscle initiator nuclear protein, members of the myogenic family of transcription factors and RNA polymerase II subunit A. Furthermore, the rs1061494 variant demonstrated marked differences in its predicted pre-mRNA structure. The other TNC variant associated with injury risk, rs2104772 (T>A), was predicted to be deleterious by two independent annotation tools. The investigated COL27A1 variants were suggested to interact with several predicted enhancers of cellular function. However, this gene is still relatively uncharacterised in musculoskeletal soft tissue injuries, and therefore, these variants will require further interrogation. The 8kb MMP3 genomic interval demonstrated high levels of linkage disequilibrium. Furthermore, MMP3 was predicted to interact with the MMP12 gene mediated by chromatin looping. The COL1A1 rs1800012 (G>T) variant overlaps the recognition sequence of the Sp1 transcription factor in intron 1. This variant is also proposed to interact with the functional promoter variants, rs1107946 (G>T) and rs11327935 (indel/T). This interaction is suggested to be mediated by chromatin looping. Furthermore, the rare rs1800012 T allele is predicted to result in a looser mRNA conformation immediately surrounding the variant within the pre-mRNA sequence compared to the ancestral G allele. Conclusion: These results provide proof of concept for the use of WES and a customised tiered filtering strategy to identify and prioritise variants for further interrogation using traditional molecular techniques. This approach, utilising previous research to guide a targeted analysis of a WES dataset has highlighted the potential risk modifying effects of several new variants in the TNC and COL27A1 genes. Furthermore, haplotype analysis has implicated several signatures encompassing variants previously associated with risk of injury in the four investigated genes. Although no new candidate variants within the MMP3 and COL1A1 genes were independently associated with risk of injury, unique allele combinations were observed to co-segregate with an altered injury risk profile. Therefore, this study has genetically characterised several previously implicated loci and highlighted new sequence signatures, which may potentially contribute to the susceptibility of musculoskeletal soft tissue injuries. The next step would be to explore the functional significance of these sequence signatures in vitro; a process that would help further characterise the biological mechanisms underpinning the observed risk associations.
Description

Reference:

Collections