• English
  • Čeština
  • Deutsch
  • Español
  • Français
  • Gàidhlig
  • Latviešu
  • Magyar
  • Nederlands
  • Português
  • Português do Brasil
  • Suomi
  • Svenska
  • Türkçe
  • Қазақ
  • বাংলা
  • हिंदी
  • Ελληνικά
  • Log In
  • Communities & Collections
  • Browse OpenUCT
  • English
  • Čeština
  • Deutsch
  • Español
  • Français
  • Gàidhlig
  • Latviešu
  • Magyar
  • Nederlands
  • Português
  • Português do Brasil
  • Suomi
  • Svenska
  • Türkçe
  • Қазақ
  • বাংলা
  • हिंदी
  • Ελληνικά
  • Log In
  1. Home
  2. Browse by Subject

Browsing by Subject "big data"

Now showing 1 - 2 of 2
Results Per Page
Sort Options
  • No Thumbnail Available
    Item
    Open Access
    Assessing GridSim for modeling the global distribution of next-generation astronomy data
    (2025) Tunbridge, James K; Simmonds, Robert
    The transfer of big data between geographic locations incurs various costs that are better managed when computing resources are used efficiently. Measuring the energy used by a computing facility is a mechanism for managing computational efficiency because the energy provided to the facility can be measured and managed. The Square Kilometer Array (SKA) radio telescope will share large volumes of science-ready astronomical data with the project collaborating partners. This dissertation attempts to address the weaknesses of the GridSim simulation toolkit for the configuration of the SKA data grid. Some of the GridSim features suited for the simulation project are: a) a network extension claiming realistic network communication; b) an extendable application programming interface because of the Java programming language; c) a datagrid extension that simulates distributed data storage, and tasks for managing the distributed files; d) packet- and flow-level network extensions and e) GridSim is used in simulations of similar real-world networks e.g., the Australian GrangeNet Gigabit network. GridSim was built primarily for modeling resources and application scheduling of parallel computing and distributed computation grids, and to assess different job scheduling policies. The SKA wide area collaborative network will send data to its distributed partners who have their own network and energy-related policies. This work proposes a design to implement, in GridSim, a prototype of the end-to-end energy cost model for large scale networks, ECOFEN (Orgerie, 2015). The purpose of this work being to demonstrate the utility of the GridSim toolkit in spite of a few known problems with the software. Invalidation exercises were performed to determine the cause of lost events in a network extension simulation, and to assess the implementation of the Routing Information Protocol, in GridSim, in multiple executions of the same simulation and configuration. In this work, GridSim simulations lose events for which a solution is suggested. In addition, the work found that routing tables do not always contain matching shortest path information for multiple executions of a simulation. The implementation of the proposed design for an ECOFEN model extension in GridSim is a project for future work after one unsuccessful attempt to implement the model in GridSim. This work considered other simulation tools as potential alternatives to the GridSim toolkit, finding SimGrid to be a likely candidate. Modern computational systems are just too complex for popular software simulation tools to copy dependably which has supported a return to live network emulation testbeds for the accurate and scalable modeling of real-world systems.
  • Loading...
    Thumbnail Image
    Item
    Open Access
    Leveraging big data resources and data integration in biology: applying computational systems analyses and machine learning to gain insights into the biology of cancers
    (2020) Sinkala, Musalula; Martin, Darren; Mulder, Nicola; Barth, Stefan
    Recently, many "molecular profiling" projects have yielded vast amounts of genetic, epigenetic, transcription, protein expression, metabolic and drug response data for cancerous tumours, healthy tissues, and cell lines. We aim to facilitate a multi-scale understanding of these high-dimensional biological data and the complexity of the relationships between the different data types taken from human tumours. Further, we intend to identify molecular disease subtypes of various cancers, uncover the subtype-specific drug targets and identify sets of therapeutic molecules that could potentially be used to inhibit these targets. We collected data from over 20 publicly available resources. We then leverage integrative computational systems analyses, network analyses and machine learning, to gain insights into the pathophysiology of pancreatic cancer and 32 other human cancer types. Here, we uncover aberrations in multiple cell signalling and metabolic pathways that implicate regulatory kinases and the Warburg effect as the likely drivers of the distinct molecular signatures of three established pancreatic cancer subtypes. Then, we apply an integrative clustering method to four different types of molecular data to reveal that pancreatic tumours can be segregated into two distinct subtypes. We define sets of proteins, mRNAs, miRNAs and DNA methylation patterns that could serve as biomarkers to accurately differentiate between the two pancreatic cancer subtypes. Then we confirm the biological relevance of the identified biomarkers by showing that these can be used together with pattern-recognition algorithms to infer the drug sensitivity of pancreatic cancer cell lines accurately. Further, we evaluate the alterations of metabolic pathway genes across 32 human cancers. We find that while alterations of metabolic genes are pervasive across all human cancers, the extent of these gene alterations varies between them. Based on these gene alterations, we define two distinct cancer supertypes that tend to be associated with different clinical outcomes and show that these supertypes are likely to respond differently to anticancer drugs. Overall, we show that the time has already arrived where we can leverage available data resources to potentially elicit more precise and personalised cancer therapies that would yield better clinical outcomes at a much lower cost than is currently being achieved.
UCT Libraries logo

Contact us

Jill Claassen

Manager: Scholarly Communication & Publishing

Email: openuct@uct.ac.za

+27 (0)21 650 1263

  • Open Access @ UCT

    • OpenUCT LibGuide
    • Open Access Policy
    • Open Scholarship at UCT
    • OpenUCT FAQs
  • UCT Publishing Platforms

    • UCT Open Access Journals
    • UCT Open Access Monographs
    • UCT Press Open Access Books
    • Zivahub - Open Data UCT
  • Site Usage

    • Cookie settings
    • Privacy policy
    • End User Agreement
    • Send Feedback

DSpace software copyright © 2002-2026 LYRASIS