Leveraging big data resources and data integration in biology: applying computational systems analyses and machine learning to gain insights into the biology of cancers

dc.contributor.advisorMartin, Darren
dc.contributor.advisorMulder, Nicola
dc.contributor.advisorBarth, Stefan
dc.contributor.authorSinkala, Musalula
dc.date.accessioned2021-02-24T18:07:38Z
dc.date.available2021-02-24T18:07:38Z
dc.date.issued2020
dc.date.updated2021-02-24T18:06:52Z
dc.description.abstractRecently, many "molecular profiling" projects have yielded vast amounts of genetic, epigenetic, transcription, protein expression, metabolic and drug response data for cancerous tumours, healthy tissues, and cell lines. We aim to facilitate a multi-scale understanding of these high-dimensional biological data and the complexity of the relationships between the different data types taken from human tumours. Further, we intend to identify molecular disease subtypes of various cancers, uncover the subtype-specific drug targets and identify sets of therapeutic molecules that could potentially be used to inhibit these targets. We collected data from over 20 publicly available resources. We then leverage integrative computational systems analyses, network analyses and machine learning, to gain insights into the pathophysiology of pancreatic cancer and 32 other human cancer types. Here, we uncover aberrations in multiple cell signalling and metabolic pathways that implicate regulatory kinases and the Warburg effect as the likely drivers of the distinct molecular signatures of three established pancreatic cancer subtypes. Then, we apply an integrative clustering method to four different types of molecular data to reveal that pancreatic tumours can be segregated into two distinct subtypes. We define sets of proteins, mRNAs, miRNAs and DNA methylation patterns that could serve as biomarkers to accurately differentiate between the two pancreatic cancer subtypes. Then we confirm the biological relevance of the identified biomarkers by showing that these can be used together with pattern-recognition algorithms to infer the drug sensitivity of pancreatic cancer cell lines accurately. Further, we evaluate the alterations of metabolic pathway genes across 32 human cancers. We find that while alterations of metabolic genes are pervasive across all human cancers, the extent of these gene alterations varies between them. Based on these gene alterations, we define two distinct cancer supertypes that tend to be associated with different clinical outcomes and show that these supertypes are likely to respond differently to anticancer drugs. Overall, we show that the time has already arrived where we can leverage available data resources to potentially elicit more precise and personalised cancer therapies that would yield better clinical outcomes at a much lower cost than is currently being achieved.
dc.identifier.apacitationSinkala, M. (2020). <i>Leveraging big data resources and data integration in biology: applying computational systems analyses and machine learning to gain insights into the biology of cancers</i>. (). ,Faculty of Health Sciences ,Department of Clinical Laboratory Sciences. Retrieved from http://hdl.handle.net/11427/32983en_ZA
dc.identifier.chicagocitationSinkala, Musalula. <i>"Leveraging big data resources and data integration in biology: applying computational systems analyses and machine learning to gain insights into the biology of cancers."</i> ., ,Faculty of Health Sciences ,Department of Clinical Laboratory Sciences, 2020. http://hdl.handle.net/11427/32983en_ZA
dc.identifier.citationSinkala, M. 2020. Leveraging big data resources and data integration in biology: applying computational systems analyses and machine learning to gain insights into the biology of cancers. . ,Faculty of Health Sciences ,Department of Clinical Laboratory Sciences. http://hdl.handle.net/11427/32983en_ZA
dc.identifier.ris TY - Doctoral Thesis AU - Sinkala, Musalula AB - Recently, many "molecular profiling" projects have yielded vast amounts of genetic, epigenetic, transcription, protein expression, metabolic and drug response data for cancerous tumours, healthy tissues, and cell lines. We aim to facilitate a multi-scale understanding of these high-dimensional biological data and the complexity of the relationships between the different data types taken from human tumours. Further, we intend to identify molecular disease subtypes of various cancers, uncover the subtype-specific drug targets and identify sets of therapeutic molecules that could potentially be used to inhibit these targets. We collected data from over 20 publicly available resources. We then leverage integrative computational systems analyses, network analyses and machine learning, to gain insights into the pathophysiology of pancreatic cancer and 32 other human cancer types. Here, we uncover aberrations in multiple cell signalling and metabolic pathways that implicate regulatory kinases and the Warburg effect as the likely drivers of the distinct molecular signatures of three established pancreatic cancer subtypes. Then, we apply an integrative clustering method to four different types of molecular data to reveal that pancreatic tumours can be segregated into two distinct subtypes. We define sets of proteins, mRNAs, miRNAs and DNA methylation patterns that could serve as biomarkers to accurately differentiate between the two pancreatic cancer subtypes. Then we confirm the biological relevance of the identified biomarkers by showing that these can be used together with pattern-recognition algorithms to infer the drug sensitivity of pancreatic cancer cell lines accurately. Further, we evaluate the alterations of metabolic pathway genes across 32 human cancers. We find that while alterations of metabolic genes are pervasive across all human cancers, the extent of these gene alterations varies between them. Based on these gene alterations, we define two distinct cancer supertypes that tend to be associated with different clinical outcomes and show that these supertypes are likely to respond differently to anticancer drugs. Overall, we show that the time has already arrived where we can leverage available data resources to potentially elicit more precise and personalised cancer therapies that would yield better clinical outcomes at a much lower cost than is currently being achieved. DA - 2020 DB - OpenUCT DP - University of Cape Town KW - big data KW - data integration KW - biology KW - machine learning LK - https://open.uct.ac.za PY - 2020 T1 - Leveraging big data resources and data integration in biology: applying computational systems analyses and machine learning to gain insights into the biology of cancers TI - Leveraging big data resources and data integration in biology: applying computational systems analyses and machine learning to gain insights into the biology of cancers UR - http://hdl.handle.net/11427/32983 ER - en_ZA
dc.identifier.urihttp://hdl.handle.net/11427/32983
dc.identifier.vancouvercitationSinkala M. Leveraging big data resources and data integration in biology: applying computational systems analyses and machine learning to gain insights into the biology of cancers. []. ,Faculty of Health Sciences ,Department of Clinical Laboratory Sciences, 2020 [cited yyyy month dd]. Available from: http://hdl.handle.net/11427/32983en_ZA
dc.language.rfc3066eng
dc.publisher.departmentDepartment of Clinical Laboratory Sciences
dc.publisher.facultyFaculty of Health Sciences
dc.subjectbig data
dc.subjectdata integration
dc.subjectbiology
dc.subjectmachine learning
dc.titleLeveraging big data resources and data integration in biology: applying computational systems analyses and machine learning to gain insights into the biology of cancers
dc.typeDoctoral Thesis
dc.type.qualificationlevelDoctoral
dc.type.qualificationlevelPhD
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
thesis_hsf_2020_sinkala musalula.pdf
Size:
62.8 MB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
0 B
Format:
Item-specific license agreed upon to submission
Description:
Collections