Browsing by Author "Ngwenya, Mzabalazo"
Now showing 1 - 5 of 5
Results Per Page
Sort Options
- ItemOpen AccessA temporal prognostic model based on dynamic Bayesian networks: mining medical insurance data(2021) Mbaka, Sarah Kerubo; Ngwenya, MzabalazoA prognostic model is a formal combination of multiple predictors from which risk probability of a specific diagnosis can be modelled for patients. Prognostic models have become essential instruments in medicine. The models are used for prediction purposes of guiding doctors to make a smart diagnosis, patient-specific decisions or help in planning the utilization of resources for patient groups who have similar prognostic paths. Dynamic Bayesian networks theoretically provide a very expressive and flexible model to solve temporal problems in medicine. However, this involves various challenges due both to the nature of the clinical domain, and the nature of the DBN modelling and inference process itself. The challenges from the clinical domain include insufficient knowledge of temporal interactions of processes in the medical literature, the sparse nature and variability of medical data collection, and the difficulty in preparing and abstracting clinical data in a suitable format without losing valuable information in the process. Challenges about the DBN methodology and implementation include the lack of tools that allow easy modelling of temporal processes. Overcoming this challenge will help to solve various clinical temporal reasoning problems. In this thesis, we addressed these challenges while building a temporal network with explanations of the effects of predisposing factors, such as age and gender, and the progression information of all diagnoses using claims data from an insurance company in Kenya. We showed that our network could differentiate the possible probability exposure to a diagnosis given the age and gender and possible paths given a patient's history. We also presented evidence that the more patient history is provided, the better the prediction of future diagnosis.
- ItemOpen AccessAn unsupervised approach to COVID-19 fake tweet detection(2024) Jarana, Bulungisa; Ngwenya, MzabalazoContext: With the ongoing COVID-19 pandemic, social media platforms have become a crucial source of information. However, not all information shared on these platforms is accurate. The dissemination of fake news, intentional or unintentional, can lead to panic among readers and further exacerbate the effects of the pandemic. Objectives: This research project aims to explore the potential of unsupervised machine learning algorithms in differentiating between genuine and fake COVID-19 news shared on Twitter. The methodology includes a literature review, experimental analysis, and the utilization of a Twitter dataset. Methods: The study used both Mini-Batch K-means and K-means algorithms of clustering techniques to provide us with ‘grouping' of Twitter data in the two of clusters. Word embedding techniques such as TF-IDF, Word2Vec, and BERT were employed because machine learning models cannot process unprocessed text data directly, and word embedding resolves this issue. Results: The results on the test data show that K-means algorithm was the best performing algorithm (76% accuracy was achieved) in determining fake tweets about Covid-19. K-means algorithm using Bert word embedding is the best performing model followed by Mini-Batch K-means using TF-IDF word embedding (69% accuracy was achieved). Conclusions: The study demonstrates that clustering Twitter COVID-19 news as genuine or fake using K-means and Mini-Batch K-means algorithms is feasible Keywords: Clustering, Machine Learning, unsupervised learning, K-Means, MiniBatch K-Means, TF-IDF, Word2Vec, Bert, Confusion Matrix, Truncated SVD (Singular Value Decomposition), t-distributed stochastic neighbourhood embedding (t-SNE)
- ItemOpen AccessAnomaly detection in a mobile data network(2019) Salzwedel, Jason Paul; Ngwenya, MzabalazoThe dissertation investigated the creation of an anomaly detection approach to identify anomalies in the SGW elements of a LTE network. Unsupervised techniques were compared and used to identify and remove anomalies in the training data set. This “cleaned” data set was then used to train an autoencoder in an semi-supervised approach. The resultant autoencoder was able to indentify normal observations. A subsequent data set was then analysed by the autoencoder. The resultant reconstruction errors were then compared to the ground truth events to investigate the effectiveness of the autoencoder’s anomaly detection capability.
- ItemOpen AccessClassification of customer complaints using machine learning algorithms(2024) Kgomo, Teballo; Ngwenya, MzabalazoPoor handling of customer complaints leads to bad customer experience and impact brand reputation. With an ever-increasing volume of complaints facing customer services team(s), handling customer complaints by service desk agents becomes tedious, especially when pressed with time. For these reasons, many companies have adopted ML technologies to improve their customer services. Technologies like ML text classification have shown great potential in improving customer support. This research proposes an ML text classification approach to categorise customer complaint (s) into one of the thirteen relevant product complaint topics. This technique aims to reduce customer agent desks' customer complaints reading and classifying time. This research uses five ML algorithms namely: LR, SVM, LightGB, KNN, and CART DT to assess how text classification technology can be used to improve the classification of customer complaints in the financial services industry by assessing how accurately would the algorithms categorize customer complaints data. These algorithms are trained on three different word vectorisation techniques namely: CV, TFIDF, and Word2Vec word-embedding. The algorithms are meant to classify each customer complaint into one of the thirteen possible Products. Due to imbalanced distributions of the target (Product complaint topics), a balanced accuracy metric was used to evaluate the model's performance. The results show that LR with TFIDF word vectorisation produced the best model with 87.29 % balanced-accuracy on the OOT dataset. This shows that ML algorithms can be used to improve the customer complaints classification process. Furthermore, the solution can be extended to solve customer complaints emails. This has the potential to improve the company's customer response time and complaint classification from the customer service desk's team.
- ItemOpen AccessETD: Case mix and coding error detection in Western Cape healthcare facilities(2024) Narayan, Saiheal; Ngwenya, Mzabalazo; Silal SheetalSouth Africa has a two-tier structure for the delivery of hospital and health care services: the public sector and the private sector. The private sector is known for having better service quality, cost, and data management. The Clinton Health Access Initiative (CHAI) has been supporting the first steps towards Diagnosis Related Group (DRG) to categorise hospitalisation costs in the public health facilities in South Africa. DRG's are widely used in the private sector for active cost management. Additionally, an issue was raised by the on-site audit clinical coding report of the public hospitals managed by the Western Cape Department of Health, which must be addressed. This dissertation applies case mix adjustment for hospitals in the Western Cape based on DRG weights from the private sector. DRG weights represent the average resources required to care for cases in that particular DRG, relative to the average resources used to treat cases in all DRGs. This is then compared to another metric that uses actual length of stay data from the public sector, which will act as a proxy for resource utilisation (Fetter, Shin, Freeman, Averill, and Thompson, 1980). The objective is to find out if case mix will help in identifying hospitals which take on highly resource intensive procedures on average. The potential of using case mix in the public sector will allow for optimized resourcing. The second part looks at generating classification models that will be used to flag diagnosis coding errors by healthcare staff in the Western Cape. Patient-level data was used which includes length of stay, procedures, and cost centre. Models trained to classify diagnosis include neural networks, multinomial logistic regression, random forests, SMOTE (Synthetic Minority Over-sampling Technique), and finally an ensemble of the top 3 models using majority voting. These models are able to handle multiple response categories. The aim of the error detection model will be to increase data quality in the public sector. The results showed that the DRG weights from the private sector might not be appropriate for the public health sector. Next, it was shown that the best predictive model for diagnosis was a random forest with an accuracy of 57% on the unseen test dataset. Lastly, through the explanatory analysis, this dissertation identified both qualitative and quantitative relationships in the data that could open up avenues for more research and development. These results can be used to help stakeholders make informed decisions and improve data quality in the public sector.