Browsing by Subject "data science"
Now showing 1 - 3 of 3
Results Per Page
Sort Options
- ItemOpen AccessA statistical approach to automated detection of multi-component radio sources(2020) Smith, Jeremy Stewart; Taylor, RussellAdvances in radio astronomy are allowing for deeper and wider areas of the sky to be observed than ever before. Source counts of future radio surveys are expected to number in the tens of millions. Source finding techniques are used to identify sources in a radio image, however, these techniques identify single distinct sources and are challenged to identify multi-component sources, that is to say, where two or more distinct sources belong to the same underlying physical phenomenon, such as a radio galaxy. Identification of such phenomena is an important step in generating catalogues from surveys on which much of the radio astronomy science is based. Historically, identifying multi-component sources was conducted by visual inspection, however, the size of future surveys makes manual identification prohibitive. An algorithm to automate this process using statistical techniques is proposed. The algorithm is demonstrated on two radio images. The output of the algorithm is a catalogue where nearest neighbour source pairs are assigned a probability score of being a component of the same physical object. By applying several selection criteria, pairs of sources which are likely to be multi-component sources can be determined. Radio image cutouts are then generated from this selection and may be used as input into radio source classification techniques. Successful identification of multi-component sources using this method is demonstrated.
- ItemOpen AccessEvaluating automated and hybrid neural disambiguation for African historical named entities(2022) Dunn, Jarryd; Suleman, HusseinDocuments detailing South African history contain ambiguous names. Ambiguous names may be due to people having the same name or the same person being referred to by multiple different names. Thus when searching for or attempting to extract information about a particular person, the name used may affect the results. This problem may be alleviated by using a Named Entity Disambiguation (NED) system to disambiguate names by linking them to a knowledge base. In recent years, transformer-based language models have led to improvements in NED systems. Furthermore, multilingual language models have shown the ability to learn concepts across languages, reducing the amount of training data required in low-resource languages. Thus a multilingual language model-based NED system was developed to disambiguate people's names within a historical South African context using documents written in English and isiZulu from the 500 Year Archive (FHYA). The multilingual language model-based system substantially improved on a probability-based baseline and achieved a micro F1-score of 0.726. At the same time, the entity linking component was able to link 81.9% of the mentions to the correct entity. However, the system's performance on documents written in isiZulu was significantly lower than on the documents written in English. Thus the system was augmented with handcrafted rules to improve its performance. The addition of handcrafted rules resulted in a small but significant improvement in performance when compared to the unaugmented NED system.
- ItemOpen AccessInvestigating the relationship between mobile network performance metrics and customer satisfaction(2019) Labuschagne, Louwrens; Bassett, Bruce; Little, FrancescaFixed and mobile communication service providers (CSPs) are facing fierce competition among each other. In a globally saturated market, the primary di↵erentiator between CSPs has become customer satisfaction, typically measured by the Net Promoter Score (NPS) for a subscriber. The NPS is the answer to the question: ”How likely is it that you will recommend this product/company to a friend or colleague?” The responses range from 0 representing not at all likely to 10 representing extremely likely. In this thesis, we aim to identify which, if any, network performance metrics contribute to subscriber satisfaction. In particular, we investigate the relationship between the NPS survey results and 11 network performance metrics of the respondents of a major mobile operator in South Africa. We identify the most influential performance metrics by fitting both linear and non-linear statistical models to the February 2018 survey dataset and test the models on the June 2018 dataset. We find that metrics such as Call Drop Rate, Call Setup Failure Rate, Call Duration and Server Setup Latency are consistently selected as significant features in models of NPS prediction. Nevertheless we find that all the tested statistical and machine learning models, whether linear or non-linear, are poor predictors of NPS scores in a month, when only the network performance metrics in the same month are provided. This suggests that either NPS is driven primarily by other factors (such as customer service interactions at branches and contact centres) or are determined by historical network performance over multiple months.