• English
  • Čeština
  • Deutsch
  • Español
  • Français
  • Gàidhlig
  • Latviešu
  • Magyar
  • Nederlands
  • Português
  • Português do Brasil
  • Suomi
  • Svenska
  • Türkçe
  • Қазақ
  • বাংলা
  • हिंदी
  • Ελληνικά
  • Log In
  • Communities & Collections
  • Browse OpenUCT
  • English
  • Čeština
  • Deutsch
  • Español
  • Français
  • Gàidhlig
  • Latviešu
  • Magyar
  • Nederlands
  • Português
  • Português do Brasil
  • Suomi
  • Svenska
  • Türkçe
  • Қазақ
  • বাংলা
  • हिंदी
  • Ελληνικά
  • Log In
  1. Home
  2. Browse by Author

Browsing by Author "Sinkala, Musalula"

Now showing 1 - 7 of 7
Results Per Page
Sort Options
  • No Thumbnail Available
    Item
    Open Access
    Enhancing detection of cervical cancer through deep learning: a comparative study of histological image-based algorithms
    (2025) Tjale, Palesa; Sinkala, Musalula
    Cervical cancer is a significant contributor to cancer-related deaths among women worldwide, especially in low- and middle-income countries (LMICs) where access to screening services is limited. Early detection plays a vital role in improving patient outcomes. However, traditional diagnostic techniques, including Pap smears and histological assessments, are often affected by variability, subjectivity, and limited sensitivity. Advances in artificial intelligence (AI), particularly deep learning (DL) and visual prompting methods, offer new possibilities for enhancing the accuracy, efficiency, and interpretability of cervical cancer detection from histology images. In this thesis, I investigate the application of DL models—ResNet50, SqueezeNet, EfficientNet, and a Visual Prompting Model—for classifying cervical cells using histopathological images. I conduct a comparative analysis to evaluate these models based on accuracy, sensitivity, specificity, and interpretability. To enhance model explainability, I employ Grad-CAM to visualize model decisions, offering insights into the diagnostic relevance of highlighted features. My results indicate that the Visual Prompting Model outperforms conventional DL models, achieving the highest accuracy (98%) and F1-score (0.99) while also demonstrating superior localization of diagnostically significant regions. EfficientNet follows closely with an accuracy of 97% and an F1-score of 0.97, while SqueezeNet achieves 95% accuracy and an F1-score of 0.95. In contrast, ResNet50 shows lower performance, with an accuracy of 91% and an F1-score of 0.91, indicating limitations in feature extraction and localization. A key finding of my study is that integrating visual prompting significantly enhances model explainability, addressing a critical challenge in AI-driven medical imaging. By directing attention to clinically relevant areas within histological images, visual prompting reduces misclassification rates, potentially aiding pathologists in making more informed diagnostic decisions. Additionally, the computational efficiency and ease of training of Visual Prompting Models suggest their feasibility for deployment in resource-constrained settings where expert pathology review is limited. Overall, my findings underscore the transformative potential of AI, particularly visual prompting, in improving cancer detection. These AI-assisted diagnostic tools promise not only to enhance accuracy but also to improve interpretability, making them highly relevant for clinical integration. I suggest that future research should focus on validating these AI models across diverse clinical settings, optimizing computational efficiency, and exploring hybrid AI approaches that incorporate molecular and genomic data for a more comprehensive approach to cervical cancer diagnostics.
  • No Thumbnail Available
    Item
    Open Access
    Evaluating convolutional neural networks and transformer architectures for image-based prediction of protein localization in eukaryotic cells
    (2025) Msipa, Sibongiseni Letticia; Sinkala, Musalula
    Background: Accurate prediction of protein subcellular localization is critical for understanding protein function and guiding experimental research. Recent advances in deep learning have enabled high-throughput image-based methods to tackle this problem by leveraging large-scale immunofluorescence microscopy datasets. The aim of this study is to comparatively evaluate convolutional neural network (CNN) architectures and Transformer- based models for the multi-label classification of protein subcellular localization in eukaryotic cells, using large-scale immunofluorescence image datasets. Methods: In this study, we comparatively evaluated convolutional neural network (CNN) architectures (DenseNet121, Xception, and InceptionV3) and transformer-based models (Vision Transformer and Swin Transformer) for multi-label classification of protein localization in eukaryotic cells. Using 12,565 immunofluorescence images from the Human Protein Atlas—representing 15 subcellular compartments—we performed transfer learning by replacing the final layers of pretrained ImageNet models to accommodate multi-label output. All models were trained with iterative stratification to handle class imbalance and evaluated on held-out test images. Results and discussion: Our findings indicate that CNN-based models, particularly DenseNet121 and Xception, achieve the highest overall accuracy and F1-scores, successfully recognizing both abundant and underrepresented classes. In contrast, transformers demonstrated variable performance. While the Swin Transformer surpassed the Vision Transformer, neither consistently matched CNN performance—likely reflecting the data requirements and hyperparameter sensitivity of transformer architectures. Visualization techniques (Grad-CAM in CNNs and attention maps in transformers) confirmed that well- performing models localize salient features to biologically relevant regions, suggesting they learn meaningful morphological cues Conclusion: These results underscore CNNs' suitability for subcellular localization analysis with moderate-scale datasets, while transformers may require more extensive tuning or larger training sets to reach comparable accuracy. Our findings suggest that CNNs, especially DenseNet121 and Xception, exhibit superior performance over transformer models in predicting protein localization. CNN-based models demonstrate higher accuracy and interpretability, positioning them as preferred choices for advancing functional proteomics and computational drug discovery.
  • No Thumbnail Available
    Item
    Open Access
    Evaluating deep learning for enhanced breast cancer diagnosis: a comparative analysis of CNN architectures
    (2025) Frankle, Solyle; Sinkala, Musalula
    Artificial Intelligence (AI), particularly its machine learning (ML) subfield, has revolutionised various sectors, including healthcare. In breast cancer care, AI's ability to analyse vast datasets and extract complex patterns from medical images has the potential to transform diagnostics and treatment strategies. Breast cancer remains one of the most prevalent cancers affecting women globally, with early and accurate diagnosis being crucial for effective treatment. AI, through its advanced image analysis capabilities, significantly improves the accuracy and efficiency of breast cancer diagnosis, specifically in distinguishing between cancer subtypes. Here, we aim to explore the application of deep learning, particularly convolutional neural networks (CNNs), in breast cancer subtype classification using histology images. A custom CNN model, alongside well-established models like ResNet50 and EfficientNetB0, was developed and evaluated for its accuracy in predicting benign and malignant breast cancer subtypes. The results demonstrated that while the custom CNN achieved an accuracy of 65% for malignant and 67% for benign subtypes with ROC-AUC scores of 0.86 and 0.90, respectively, ResNet50 significantly outperformed both the custom model and EfficientNetB0. ResNet50 attained an accuracy of 77% in classifying malignant subtypes and 77% for benign subtypes, accompanied by ROC-AUC scores of 0.92 and 0.96, respectively. Additionally, ResNet50 exhibited higher precision (0.68 for malignant, 0.67 for benign), recall (0.65 for malignant, 0.67 for benign), and F1 scores (0.65 for malignant, 0.67 for benign) across most subtypes, underscoring its robust performance and reliability in clinical settings. In conclusion, AI, specifically through advanced CNN architectures, can greatly enhance breast cancer diagnosis by providing more accurate subtype classifications. Future work should focus on integrating these models into clinical workflows, enabling faster and more personalised treatment planning. Moreover, continued refinement of these models, including addressing the complexities of tumour heterogeneity and incorporating multimodal data, will be crucial for their widespread adoption in oncology.
  • No Thumbnail Available
    Item
    Open Access
    Exploring topological data analysis in gene expression data topology-driven biomarker discovery and clinical outcome prediction in oncology
    (2025) Nyase, Ndivhuwo; Mashatola, Lebohang; Muller, Julia; Sinkala, Musalula
    This thesis is grounded in the fundamental observation that biological data has shape and this shape matters. Beneath the high-dimensional, often noisy landscape of gene expression profiles lie hidden topological structures (connected components, loops and voids) that capture the complex relationships driving cancer development and progression. By embracing this perspective, we position Topological Data Analysis (TDA) and persistent homology at the core of a novel analytical framework designed to tackle two key challenges in cancer research: clinical outcome prediction and biomarker discovery. In this study, we employ Weighted Gene Topological Data Analysis (WGTDA) to extract topological features from gene expression data, which serve as prognostic biomarkers for cancer classification, staging, and treatment response. Moreover, by integrating these topological features with machine learning models we aim to enhance the predictive accuracy for clinical outcomes. For clinical outcome prediction, we transformed gene expression profiles into topological fingerprints using multiple co-expression measures—namely, Pearson Correlation, Distance Correlation, and Weighted Topological Overlap (wTO) computed with both Pearson and Distance-based adjacencies. These topological features were analyzed using Random Forests. In parallel, we compared the predictive performance of traditional machine learning models (SVM, Gradient Boosting Decision Trees, Random Forest, and Neural Networks) trained on raw gene expression data against models incorporating the topological fingerprints. This comparative analysis was conducted across three classification tasks: cancer type (using TCGA-SARC, TCGA-PCPG, and TCGA-ESCA datasets), cancer staging (using TCGA-HNSC for stages I–IV), and treatment response (responders vs. non-responders). For biomarker identification, the same three tasks were applied using the best performing co-expression measure to generate a global topological representation of the patient population. This provided a disease-level view, highlighting shared homological patterns to facilitate biomarker discovery. Additionally, a dedicated visualization tool has been developed to aid in interpreting these topological signatures and identifying critical biomarkers. The tool is available at https://nnyase.github.io/MSc-Thesis/ WGTDA significantly enhanced phenotype prediction tasks by overcoming common pitfalls of traditional ML models in RNA-Seq data, such as overfitting and poor handling of class imbalance. TDA-derived features improved generalizability of ML models in tasks such as cancer staging and treatment response prediction. Our findings strongly support the integration of TDA into clinical outcome prediction, demonstrating its value in capturing nuanced patterns that allow ML methods to learn more effectively. Moreover, WGTDA remarkably identified key gene signatures for cancer type, staging, and treatment response without relying on pre-existing biological assumptions, yielding biomarkers that are strongly supported by the existing literature. These results underscore the method's reliability and potential clinical utility in precision oncology.
  • No Thumbnail Available
    Item
    Open Access
    Genetic differences in lung adenocarcinoma cells from patients of African and European ancestry
    (2024) Diseko, Karabo; Mulder, Nicola; Sinkala, Musalula
    In the past two decades, advancements in cancer genetics research have significantly enhanced our molecular comprehension of human cancers. This progress has led to the development of improved clinical tools for the precise diagnosis, prognosis prediction, and tailored treatment of cancers. However, the predominant focus of this research has been on individuals of European ancestry, inadvertently marginalizing the diverse genetic landscapes represented by other ethnic populations. Given minor differences in the genetic makeup across diverse ethnicities, specific cancer genetic variants prevalent in certain ethnic groups may remain overlooked within the current research. Some studies have indeed illuminated nuanced distinctions in the genetic architecture of cancers among patients of varying ethnic backgrounds. Disparities in cancer incidence and outcome between patients of different ethnicities have also been identified. These distinctions stem from a combination of environmental and biological factors, collectively shaping the intricate interplay of cancer genetics and its clinical manifestations. This study endeavours to elucidate clinically significant disparities in lung adenocarcinoma (LUAD) genetics across distinct ethnicities, particularly focusing on African ancestry (AA) and European ancestry (EA) populations. A meticulous comparison of genetic traits within LUAD cells derived from these ethnic groups is conducted to pinpoint genetic variances that hold potential biological relevance. Leveraging data from The Cancer Genome Atlas' lung adenocarcinoma (TCGA-LUAD) study, samples were stratified based on self-reported racial classifications into African ancestry (AA) and European ancestry (EA) groups. Propensity score matching (PSM) was meticulously employed to mitigate disparities in crucial clinical attributes, ensuring a balanced basis for subsequent genetic comparisons. A total of 147 EA and 49 AA samples were extracted following PSM, forming the basis for comprehensive comparisons of gene expression, copy number alterations, and mutation frequencies between the two ethnic cohorts. Key genetic disparities between the two groups were discerned, including 371 significantly differentially expressed (SDE) genes, a higher incidence of copy number alterations in the AA group compared to the EA group, and 101 genes exhibiting varying mutation frequencies between the two groups. An analysis of the biological functions impacted by these genetic variances revealed involvement in critical processes such as cellular response to xenobiotics, hormone metabolism and regulation, mitochondrial energy production, and epithelial-mesenchymal transition. We posit that clinically relevant biological distinctions in LUAD tumours between AA and EA patients stem from differential expression and mutations in genes encoding pivotal proteins such as UDP glucuronosyltransferases and cytochrome P450s, among others. Variations in the sequence and expression of these genes can significantly influence drug response and hallmark cancer cell characteristics, including energy production and epithelial-mesenchymal transition. Despite the limitation of a relatively small sample size, this study illuminates genetic disparities that underpin clinically significant differences in tumour biology between LUAD patients of African and European ancestry.
  • Loading...
    Thumbnail Image
    Item
    Open Access
    Leveraging big data resources and data integration in biology: applying computational systems analyses and machine learning to gain insights into the biology of cancers
    (2020) Sinkala, Musalula; Martin, Darren; Mulder, Nicola; Barth, Stefan
    Recently, many "molecular profiling" projects have yielded vast amounts of genetic, epigenetic, transcription, protein expression, metabolic and drug response data for cancerous tumours, healthy tissues, and cell lines. We aim to facilitate a multi-scale understanding of these high-dimensional biological data and the complexity of the relationships between the different data types taken from human tumours. Further, we intend to identify molecular disease subtypes of various cancers, uncover the subtype-specific drug targets and identify sets of therapeutic molecules that could potentially be used to inhibit these targets. We collected data from over 20 publicly available resources. We then leverage integrative computational systems analyses, network analyses and machine learning, to gain insights into the pathophysiology of pancreatic cancer and 32 other human cancer types. Here, we uncover aberrations in multiple cell signalling and metabolic pathways that implicate regulatory kinases and the Warburg effect as the likely drivers of the distinct molecular signatures of three established pancreatic cancer subtypes. Then, we apply an integrative clustering method to four different types of molecular data to reveal that pancreatic tumours can be segregated into two distinct subtypes. We define sets of proteins, mRNAs, miRNAs and DNA methylation patterns that could serve as biomarkers to accurately differentiate between the two pancreatic cancer subtypes. Then we confirm the biological relevance of the identified biomarkers by showing that these can be used together with pattern-recognition algorithms to infer the drug sensitivity of pancreatic cancer cell lines accurately. Further, we evaluate the alterations of metabolic pathway genes across 32 human cancers. We find that while alterations of metabolic genes are pervasive across all human cancers, the extent of these gene alterations varies between them. Based on these gene alterations, we define two distinct cancer supertypes that tend to be associated with different clinical outcomes and show that these supertypes are likely to respond differently to anticancer drugs. Overall, we show that the time has already arrived where we can leverage available data resources to potentially elicit more precise and personalised cancer therapies that would yield better clinical outcomes at a much lower cost than is currently being achieved.
  • No Thumbnail Available
    Item
    Open Access
    Using machine learning to understand the link between gene essentiality, gene expression and the chemosensitivity of cancer cells
    (2024) Mcinga, Kuhle; Sinkala, Musalula; Martin, Darren
    The emergence of pharmacogenomics databases has presented unique opportunities to leverage machine learning in precision medicine, particularly in drug response prediction. In this thesis, an in-depth investigation is conducted on carefully curated and integrated breast cancer focused datasets from the GDSC (Genomics of Drug Sensitivity in Cancer) and Achilles (CRISPR derived) project databases. Specifically, machine learning techniques are employed to accurately predict the drug responses of cancer cells, laying the groundwork for personalised treatment strategies. Through rigorous training of machine learning models, drug-response classifiers were devised that demonstrated remarkable predictive capabilities, with the best performing classifier achieving an F1-score of 0.86 and an AUC of 0.85, indicating its effectiveness in drug response prediction. Training these models on GDSC and Achilles datasets encompassing various drug IC50 values, ensured generalization of the models across different drugs and cell
UCT Libraries logo

Contact us

Jill Claassen

Manager: Scholarly Communication & Publishing

Email: openuct@uct.ac.za

+27 (0)21 650 1263

  • Open Access @ UCT

    • OpenUCT LibGuide
    • Open Access Policy
    • Open Scholarship at UCT
    • OpenUCT FAQs
  • UCT Publishing Platforms

    • UCT Open Access Journals
    • UCT Open Access Monographs
    • UCT Press Open Access Books
    • Zivahub - Open Data UCT
  • Site Usage

    • Cookie settings
    • Privacy policy
    • End User Agreement
    • Send Feedback

DSpace software copyright © 2002-2026 LYRASIS