Browsing by Subject "data science"
Now showing 1 - 5 of 5
Results Per Page
Sort Options
- ItemOpen AccessA statistical approach to automated detection of multi-component radio sources(2020) Smith, Jeremy Stewart; Taylor, RussellAdvances in radio astronomy are allowing for deeper and wider areas of the sky to be observed than ever before. Source counts of future radio surveys are expected to number in the tens of millions. Source finding techniques are used to identify sources in a radio image, however, these techniques identify single distinct sources and are challenged to identify multi-component sources, that is to say, where two or more distinct sources belong to the same underlying physical phenomenon, such as a radio galaxy. Identification of such phenomena is an important step in generating catalogues from surveys on which much of the radio astronomy science is based. Historically, identifying multi-component sources was conducted by visual inspection, however, the size of future surveys makes manual identification prohibitive. An algorithm to automate this process using statistical techniques is proposed. The algorithm is demonstrated on two radio images. The output of the algorithm is a catalogue where nearest neighbour source pairs are assigned a probability score of being a component of the same physical object. By applying several selection criteria, pairs of sources which are likely to be multi-component sources can be determined. Radio image cutouts are then generated from this selection and may be used as input into radio source classification techniques. Successful identification of multi-component sources using this method is demonstrated.
- ItemOpen AccessEvaluating automated and hybrid neural disambiguation for African historical named entities(2022) Dunn, Jarryd; Suleman, HusseinDocuments detailing South African history contain ambiguous names. Ambiguous names may be due to people having the same name or the same person being referred to by multiple different names. Thus when searching for or attempting to extract information about a particular person, the name used may affect the results. This problem may be alleviated by using a Named Entity Disambiguation (NED) system to disambiguate names by linking them to a knowledge base. In recent years, transformer-based language models have led to improvements in NED systems. Furthermore, multilingual language models have shown the ability to learn concepts across languages, reducing the amount of training data required in low-resource languages. Thus a multilingual language model-based NED system was developed to disambiguate people's names within a historical South African context using documents written in English and isiZulu from the 500 Year Archive (FHYA). The multilingual language model-based system substantially improved on a probability-based baseline and achieved a micro F1-score of 0.726. At the same time, the entity linking component was able to link 81.9% of the mentions to the correct entity. However, the system's performance on documents written in isiZulu was significantly lower than on the documents written in English. Thus the system was augmented with handcrafted rules to improve its performance. The addition of handcrafted rules resulted in a small but significant improvement in performance when compared to the unaugmented NED system.
- ItemOpen AccessHigh-resolution virtual try-on with garment extraction using generative adversarial networks(2024) Charters, Daniel J; Britz, Stefan S; Bernicchi, DinoImage-based virtual try-on aims to depict an individual wearing a garment not originally worn by them. While existing literature predominantly focuses on garments from standalone images, this research addresses the use of images where the garment is already being worn by another individual. The study bridges a notable gap as most current systems are tailored for standalone garment images. The proposed system, given a pair of high-resolution images, extracts the garment from one, refines it using context-aware image inpainting, and subsequently transfers it onto the second image's subject. The methodology incorporates various off-the-shelf models, notably Part Grouping Network (PGN), Densepose, and OpenPose for pre-processing. A state-of-the-art context-aware inpainting model refines the garments, and the final synthesis leverages the HR-VITON architecture, producing images at a resolution of 768 × 1024. Distinctively, our model processes both standalone and garment-on-person images. Evaluating the models involves testing on 2 032 high-resolution images under both paired and unpaired conditions. Metrics such as RMSE, Peak Signal-to-Noise Ratio (PSNR), Learned Perceptual Image Patch Similarity (LPIPS), Structural Similarity (SSIM), Inception Score (IS), Fréchet Inception Distance (FID), and Kernel Inception Distance (KID) assessed the model's prowess. Benchmarked against HR-VITON, ACGPN, and CP-VTON, our model slightly trailed HR-VITON but notably surpassed ACGPN and CP-VTON. In realistic, unpaired conditions, the model achieved an IS of 3.152, an FID of 15.3, and a KID of 0.0063. This is compared to an IS of 3.398, an FID of 11.93, and a KID of 0.0034 achieved by HR-VITON on the same data. ACGPN has an FID of 43.29, and a KID of 0.0373, while CP-VTON has an FID of 43.28, while it has a KID of 0.0376. IS is not measured for both ACGPN and CP-VTON. An ablation study underscored the importance of context-aware inpainting in our network. The findings highlight the model's ability to generate convincing, high-resolution virtual try-on images from garment-on-person extractions, addressing a prevalent gap in the literature and offering tangible applications in high-resolution virtual try-on image generation.
- ItemOpen AccessHospital readmission risk(2024) Mugova, Amos; Salau, Sulaiman; Er, SebnemHospital readmissions are a significant challenge in healthcare, as they lead to in creased costs, higher risk of mortality, treatment complications, and patient dis tress. This minor dissertation, set within the South African healthcare framework, investigates the potential of both traditional clinical screening tools and advanced statistical learning methods for predicting hospital readmission risk. The meth ods considered include the LACE score, decision trees, logistic regression, random forests, gradient-boosting methods, and neural networks. The study uses data from South Africa's privately insured demographic, provided by a private insurer. It includes a comprehensive array of patient information such as demographics, prescribed medications, medical procedures undergone, and historical hospital usage. Feature selection methods were used to identify relevant variables for model training, and the effectiveness of these variables was assessed based on their ability to differentiate between patients at risk of hospital readmission within 30 days after discharge. The statistical learning methods' efficacy was measured using several performance indicators, such as prediction accuracy, F1 score, Area Under the Receiver Operating Characteristics Curve (AUC), Area Under the Precision-Recall Curve (AUC-PR), and the Matthews Correlation Coefficient (MCC). The study found that the neural network model outperformed the other statistical learning methods evaluated across various metrics. Moreover, the research extends the range of variables used to predict hospital read missions beyond the traditional LACE score, incorporating critical factors such as the frequency and costs of previous hospital visits, expenses related to specialist services, patient age, and the primary diagnosis category.
- ItemOpen AccessInvestigating the relationship between mobile network performance metrics and customer satisfaction(2019) Labuschagne, Louwrens; Bassett, Bruce; Little, FrancescaFixed and mobile communication service providers (CSPs) are facing fierce competition among each other. In a globally saturated market, the primary di↵erentiator between CSPs has become customer satisfaction, typically measured by the Net Promoter Score (NPS) for a subscriber. The NPS is the answer to the question: ”How likely is it that you will recommend this product/company to a friend or colleague?” The responses range from 0 representing not at all likely to 10 representing extremely likely. In this thesis, we aim to identify which, if any, network performance metrics contribute to subscriber satisfaction. In particular, we investigate the relationship between the NPS survey results and 11 network performance metrics of the respondents of a major mobile operator in South Africa. We identify the most influential performance metrics by fitting both linear and non-linear statistical models to the February 2018 survey dataset and test the models on the June 2018 dataset. We find that metrics such as Call Drop Rate, Call Setup Failure Rate, Call Duration and Server Setup Latency are consistently selected as significant features in models of NPS prediction. Nevertheless we find that all the tested statistical and machine learning models, whether linear or non-linear, are poor predictors of NPS scores in a month, when only the network performance metrics in the same month are provided. This suggests that either NPS is driven primarily by other factors (such as customer service interactions at branches and contact centres) or are determined by historical network performance over multiple months.