The chromatin landscape of colorectal cancer cells

Doctoral Thesis


Permanent link to this Item
Journal Title
Link to Journal
Journal ISSN
Volume Title
Chromatin organization is at the heart of deciphering gene regulation as it is instructive to transcription. Current technological advances in next-generation sequencing approaches have offered unprecedented opportunities to interrogate the genomic landscape in multiple pathological and clinical presentations. Historically, mutations and alterations at the genomic loci of protein-coding genes were thought to be exclusively causal to many human diseases. However, the non-coding genome has emerged as the master regulator of chromatin dynamics and transcriptional activity. With cancer increasingly becoming the greatest health epidemic of our time, the comprehensive genomic characterization of tumor genotypes has become central to current therapeutic approaches. Functioning as the basic unit of chromatin organisation, chromatin loops and topologically associating domains (TADs) compartmentalize genomic loci and their corresponding molecular transcriptional elements in three-dimensional space. Transcription of the human genome is proximity-dependent requiring the cooperative engagement of non-coding elements and epigenetic modifiers to create permissive topological chromatin contacts and structures. The repertoire of chromatin contacts at any given time is regulated by the threedimensional structure and organization of the chromatin. TAD structures are formed and maintained by chromatin insulating proteins such as CTCF (CCCTC-binding factor) and multiprotein complex, cohesin. The dysfunction of which, through mutational and epigenetic aberrations, directly impacts a plethora of chromatin contacts and the resultant transcriptional profiles within each cell. Loops and TADs are formed by the binding of CTCF on the conserved 19 bp CTCF binding motif as the chromatin is protruded through the "ring-like" multi-protein complex, cohesin. When two convergently oriented and CTCF enriched CTCF-binding sites (CBSs) come into contact within the ring, cohesin is thought to "hand-cuff" the chromatin resulting in the formation a chromatin loop. These loop structures then serve to compartmentalize and restrict the chromatin contacts and their frequency within each loop. Promoter-resident CBSs can also function as "docking sites” for tissue- and context-specific enhancers. The dysregulation of CTCF binding has been repeatedly demonstrated to directly alter chromatin contacts in a vast array of cellular contexts including cancer. Fundamentally, CTCF functions as a potent regulator of chromatin contacts, which directly instruct transcriptional status. Thus, CTCF binding has become an attractive regulatory target for manipulating the topological and transcriptional activity of chromatin. In this study, we sought to identify CBS swith differential, specifically abrogated CTCF enrichment that may be hijacked by oncogenes in an attempt to modify transcriptional programmes to favour cancer progression. To this end, we developed an integrated bioinformatic pipeline to identify promoter-associated lower-CTCF enrichment sites (PA-LCes) in colorectal cancer (CRC) cell lines as compared to primary colonic tissue from CTCF ChIP-Seq data. With ever-growing catalogues of nextgeneration sequencing datasets, including ChIP-Seq, in the public domain, the use of ENCODE datasets proved to be an economical option and added layer of standardization in our analysis. Briefly the pipeline developed in this study takes ENCODE ChIP-Seq FASTQ files from the NCBI SRA using fastqdump as input files. The FASTQ files undergo a quality control and dataset filtration with FASTQC. The filtered datasets are then aligned to the hg38 human genome and fed back into FASTQC to ensure aligned reads pass quality control metrics. The mapped reads are then processed using samtools and duplicate reads are marked with the picard markduplicates argument. Narrow peaks are then called from processed reads using MACS2 and processed using bedtools. Called peaks then undergo a final quality control step using ChIPQCr and are visualized using IGV before undergoing differential enrichment analysis. Differential CTCF enrichment analysis between the peaks in primary sigmoidal colon cells and CRC cell lines is then conducted using DeSeq2 within DiffBind. Lower CTCF enrichment peaks are then used for the discovery of the canonical CTCF MA00139.1 motif using homer and compared to similar annotations in the primary consensus peakset. The resultant lower CTCF enrichment peaks are then annotated using homer and ChiPpeakAnno to determine their genomic locations and extract LCes located proximal (<1kb) to annotated TSS or promoter regions i.e. PA-LCes. The PA-LCe discovery pipeline developed in this study is highly robust, resulting in some previously validated CBSs implicated in oncogenesis. Intriguingly, the PA-LCe sites identified in this study emanate from bidirectional promoters at oncogenes with differential methylation and transcriptional patterns in cancer. Additionally these PA-LCes transcribe antisense lncRNAs such as the tumor-suppressive aslncRNA ZNF582-AS1. This data adds to the recent body of evidence that suggests that disruption of promoter-associated CBSs leads to fluctuations in promoter activity. Recent studies have implicated the requirement of CTCFlncRNA complexes at promoter regions in facilitating and regulating CTCF docking on chromatin which subsequently influences transcriptional activity. In accordance with this, our data suggests that the lncRNAs at PA-LCe loci may be molecular targets for the regulation ofCTCF binding and transcriptional activity in CRC. Perturbation of CTCF enrichment at PALCes in CRC result in differential chromatin contacts, epigenetic context and, the transcriptional activity of the promoters in which they reside. As CTCF binding at CBSs sites is highly modular, the use of targeted CRISPR-mediated gene-editing and DNA methylation at PA-LCe CBSs may represent viable and druggable oncogenic targets.