• English
  • Čeština
  • Deutsch
  • Español
  • Français
  • Gàidhlig
  • Latviešu
  • Magyar
  • Nederlands
  • Português
  • Português do Brasil
  • Suomi
  • Svenska
  • Türkçe
  • Қазақ
  • বাংলা
  • हिंदी
  • Ελληνικά
  • Log In
  • Communities & Collections
  • Browse OpenUCT
  • English
  • Čeština
  • Deutsch
  • Español
  • Français
  • Gàidhlig
  • Latviešu
  • Magyar
  • Nederlands
  • Português
  • Português do Brasil
  • Suomi
  • Svenska
  • Türkçe
  • Қазақ
  • বাংলা
  • हिंदी
  • Ελληνικά
  • Log In
  1. Home
  2. Browse by Subject

Browsing by Subject "statistical sciences"

Now showing 1 - 9 of 9
Results Per Page
Sort Options
  • Loading...
    Thumbnail Image
    Item
    Open Access
    Applications of analysis of variance in wool marketing
    (1971) Du Plessis, Jasper Johan Jacques; Troskie, C.G.
    Analysis of variance could be described as a statistical technique for analysing measurements depending on several kinds of effects operating simultaneously so as to decide which kinds of effects are important and to estimate the effects. Although probably not susceptible of a very precise definition, it in general consists of a body of tests of hypotheses and methods of estimation using statistics which are linear combinations of sums of squares of linear functions of the observed values. Having been developed mainly in connection with problems of agricultural experimentation, the application thereof in the South African Wool Trade seems non existent. I hope that this thesis will illustrate some of the very useful applications, especially to the extent where the rejection of all (or some) of the hypotheses under consideration is in itself as significant as the acceptance thereof would have been.
  • Loading...
    Thumbnail Image
    Item
    Open Access
    Constructing growth reference curves for a cohort of South African children
    (2022) Ross, Melinda; Little, Francesca
    Childhood growth impacts the future welfare of an individual and ultimately the nation. The importance of childhood growth monitoring with growth curves that accurately represent the growth of the population of interest cannot be overemphasised. This dissertation sought to model the growth of a cohort of South African children and compare their growth to the World Health Organisation (WHO) 2006 Child Growth Standards. Growth reference curves were derived using parametric and semi-parametric methods within the Generalised Additive Models for Location, Scale and Shape (GAMLSS) framework. Various distributions for the growth measurements were compared as well as various curve smoothing approaches for the longitudinal profiles, including cubic splines, fractional polynomials and Berkey-Reed First and Second Order models. The preferred approach was to use the Box-Cox Power Exponential (BCPE) distribution with curve smoothing by cubic splines. Non-parametric quantile regression served as a confirmation that the chosen parametric distributions were appropriate for the data. A comparison of the derived growth references to the WHO (2006) standards revealed deviations in the patterns of growth and a greater likelihood of diagnosing a child as underweight, stunted or having micro- or macrocephaly when measured against the WHO standards. The poor socioeconomic status and associated harmful exposures of the cohort were noted as potential contributing factors. A fair comparison would require a reasonably healthy and representative sample of the South African population. These findings do however call into question the appropriateness of the WHO standards for measuring the growth of South African children and bring into focus the value of developing national growth standards.
  • Loading...
    Thumbnail Image
    Item
    Open Access
    Development of a test suite for single object tracking algorithms in video
    (2021) Donnelly, Kieran; Pienaar, Etienne
    Flying Camera Solutions (FlyCam), within Sony Lund's startup accelerator, intends to provide drone videography to paying customers in ski resorts: a customer should be able to go about their activity as usual while a drone films them. Visual object tracking, enabling the drone to track the customer throughout the activity, is a primary obstacle in creating a viable autonomous videography service. FlyCam needs an object tracking algorithm which is accurate, robust, real-time, and requiring minimal computational overhead. We propose two innovations to aid in the selection of an appropriate tracking algorithm. Firstly, a video annotation algorithm, making use of an object detector to record the position and type of object in each frame of a video clip. Secondly, an algorithm designed to evaluate the performance of any given object tracker based on a set of performance metrics. These metrics include, among others, measures of positional accuracy, frame rate, and false positive rate. For the video annotation algorithm we implemented the state-of-the-art Mask R-CNN object detector, which achieved an average frame rate of 1.5 fps annotating video clips in up to 4K resolution. Another algorithm then played back the annotated clips to the user such that incorrect object detections could be rooted out or rectified. With little relevant annotated video available, the annotation algorithm proved useful in preparing a suite of 18 clips to be evaluated. Ten performance metrics were adapted from multi-object to single-object tracking. Nine tracking algorithms were then run on each of the 18 test video clips at varying resolutions to produce 375 tracking observations for analysis. The evaluation results revealed the optimal tracking algorithm to be Re3: a recurrent-convolutional neural network tracker which runs at respectable speeds on a consumer laptop. This is a promising result; with enough annotated data, neural networks can be retrained to improve performance. Within just a few months of operation, FlyCam could amass enough specific video data to significantly improve the neural network-based tracker.
  • Loading...
    Thumbnail Image
    Item
    Open Access
    Evolutionary algorithms for optimising reinforcement learning policy approximation
    (2019) Cuningham, Blake; Bassett, Bruce
    Reinforcement learning methods have become more efficient in recent years. In particular, the A3C (asynchronous advantage actor critic) approach demonstrated in Mnih et al. (2016) was able to halve the training time of the existing state-of-the-art approaches. However, these methods still require relatively large amounts of training resources due to the fundamental exploratory nature of reinforcement learning. Other machine learning approaches are able to improve the ability to train reinforcement learning agents by better processing input information to help map states to actions - convolutional and recurrent neural networks are helpful when input data is in image form that does not satisfy the Markov property. The specific required architecture of these convolutional and recurrent neural network models is not obvious given infinite possible permutations. There is very limited research giving clear guidance on neural network structure in a RL (reinforcement learning) context, and grid search-like approaches require too many resources and do not always find good optima. In order to address these, and other, challenges associated with traditional parameter optimization methods, an evolutionary approach similar to that taken by Dufourq and Bassett (2017) for image classification tasks was used to find the optimal model architecture when training an agent that learns to play Atari Pong. The approach found models that were able to train reinforcement learning agents faster, and with fewer parameters than that found by OpenAI’s model in Blackwell et al. (2018) - a superhuman level of performance.
  • Loading...
    Thumbnail Image
    Item
    Open Access
    Examination timetabling at the University of Cape Town: a tabu search approach to automation
    (2022) Steenkamp, Ebrahim; Rakotonirainy, Rosephine Georgina
    With the rise of schedules and scheduling problems, solutions proposed in literature have expanded yet the disconnect between research and reality remains. The University of Cape Town's (UCT) Examinations Office currently produces their schedules manually with software relegated to error-checking status. While they have requested automation, this study is the first attempt to integrate optimisation techniques into the examination timetabling process. Tabu search and Nelder-Mead methodologies were tested on the UCT November 2014 examination timetabling data with tabu search proving to be more effective, capable of producing feasible solutions from randomised initial solutions. To make this research more accessible, a user-friendly app was developed which showcased the optimisation techniques in a more digestible format. The app includes data cleaning specific to UCT's data management system and was presented to the UCT Examinations Office where they expressed support for further development: in its current form, the app would be used as a secondary tool after an initial solution has been manually obtained.
  • Loading...
    Thumbnail Image
    Item
    Open Access
    Exploring the application of Word2Vec to basket transaction data in the grocery retail industry
    (2022) De Swardt, Gideon Jacobus; Nyirenda, Juwa Chiza
    In this thesis, we explore the application of Word2vec to basket transaction data provided by a large grocery retailer in South Africa. Word2vec is an algorithm based on representation learning. The objective of the exploration is to establish whether the application of Word2vec to basket transaction data would generate product embeddings that represent a useful relationship between products. Furthermore, we compareWord2vec's outputs and performance to traditional methods for studying product relationships which include Association Rules Mining (ARM) and Recommendation Systems. The results from the experiments showed that indeed product embeddings created by Word2vec on transaction data are meaningful and useful. It was clear that the idea of using transactions in the place of sentences to the neural network, provides analogous results to that of a natural language task. Word2vec clearly demonstrated its ability to cluster products that are homogeneous or fulfill similar needs. Furthermore this sort of product relationship was not provided by any other traditional methods, which was clear when comparing the outputs to that of ARM and Recommendation Systems. We also show that usingWord2vec could potentially provide insight on truly complementary products that ARM perhaps fails to do. Word2vec also proved to be incredibly scalable, taking input data of 20 times the size of what traditional methods could handle on a local computer. We end with a description of a potential application of the ideas learnt during the course of this study, with a real business problem, that we believe could lead to an enhanced customer shopping experience and in turn increase revenue and profits for the retailer.
  • Loading...
    Thumbnail Image
    Item
    Open Access
    Market state discovery
    (2022) Singo, Unarine; Gebbie, Timothy
    We explore the concept of financial market state discovery by assessing the robustness of two unsupervised machine learning algorithms: Inverse Covariance Clustering (ICC) and Agglomerative Super Paramagnetic Clustering (ASPC). The assessment is carried out by: simulating market datasets varying in complexity; implementing ICC and ASPC to estimate the underlying states (using only simulated log-returns as inputs); and measuring the algorithms' ability to recover the underlying states, using the Adjusted Rand Index (ARI) as a performance metric. Experiments revealed that ASPC is a more robust and better performing algorithm than ICC. ICC is able to produce competitive results in 2-state markets; however, ICC's primary disadvantage is its inability to maintain strong performance in 3, 4 and 5-state markets. For example, ASPC produced ARI numbers that were up to 800% superior to ICC in 5-state markets. Furthermore, ASPC does not rely on the art of selecting good hyper-parameters such as, the number of states a priori. ICC's utility as a market state discovery algorithm is limited.
  • Loading...
    Thumbnail Image
    Item
    Open Access
    Sequential nonparametric estimation via Hermite series estimators
    (2020) Stephanou, Michael Jared; Varughese, Melvin
    Algorithms for estimating the statistical properties of streams of data in real time, as well as for the efficient analysis of massive data sets, are becoming particularly pertinent given the increasing ubiquity of such data. In this thesis we introduce novel approaches to sequential (online) estimation in both stationary and non-stationary settings based on Hermite series density estimators. In the univariate context we apply Hermite series based distribution function estimators to sequential cumulative distribution function estimation. These distribution function estimators are particularly useful because they allow the sequential estimation of the full cumulative distribution function. This is in contrast to the empirical distribution function estimator and smooth kernel distribution function estimator which only allow sequential cumulative probability estimation at predefined values on the support of the associated density function. We explore the asymptotic consistency and robustness properties of the Hermite series based cumulative distribution function estimator thereby redressing a gap in the literature. Given the sequential Hermite series based distribution function estimator, we obtain sequential quantile estimates numerically. Our algorithms go beyond existing sequential quantile estimation algorithms in that they allow arbitrary quantiles (as opposed to pre-specified quantiles) to be estimated at any point in time, in both the static and dynamic quantile estimation settings. In the bivariate context we introduce a Hermite series based sequential estimator for the Spearman's rank correlation coefficient and provide algorithms applicable in both the stationary and non-stationary settings. To treat the the non-stationary setting, we introduce a novel, exponentially weighted estimator for the Spearman's rank correlation, which allows the local nonparametric correlation of a bivariate data stream to be tracked. To the best of our knowledge this is the first algorithm to be proposed for estimating a time-varying Spearman's rank correlation that does not rely on a moving window approach. We explore the practical effectiveness of the Hermite series based estimators through real data and simulation studies, demonstrating competitive performance compared to leading existing algorithms. The potential applications of this work are manifold. Our sequential distribution function and quantile estimation algorithms can be applied to real time anomaly and outlier detection, real time provisioning for future demand as well as real time risk estimation for example. The Hermite series based Spearman's rank correlation estimator can be applied to fast and robust online calculation of correlation which may vary over time. Possible machine learning applications include fast feature selection and hierarchical clustering on massive data sets amongst others.
  • Loading...
    Thumbnail Image
    Item
    Open Access
    Using Siamese neural networks to identify individual animals
    (2022) Madzingira, Tinotenda; Durbach, Ian
    The ability to identify individual animals in ecology is essential for monitoring [Schneider et al., 2018]. It allows a researcher to determine whether an animal being observed is new to the researcher or has previously been observed. This in turn allows estimation of ecological metrics such as population density [Schneider et al., 2018]. Traditionally, this was done by capturing and physically tagging animals [Cross et al., 2014a]. Increasingly, animal observation is being conducted by photographic means creating the need to be able to identify individual animals from images [Weinstein, 2018]. The dissertation answers whether machine learning can determine if the animals in a pair of images come from the same individual animal. To answer this question, the animal of interest in each image must be (1) located and isolated and (2) compared to the other image in the pair. Mask-Regional Convolution Neural Networks (Mask - RCNN) [He et al., 2017] are used for the object detection and instance segmentation which answers (1). This is a modern deep learning approach which has been used for tasks such as identifying breast cancer tumors [Chiao et al., 2019] outside of ecology and to measure the size of whales [Gray et al., 2019b] in ecology. In addition to classifying the object and proposing a bounding box, Mask RCNN also proposes a mask for each object. The “mask” is a selection of pixels that belong to the object of interest which are highlighted in processed images. A ResNet-101 model with the Feature Pyramid Network (FPN) that has been trained on the MS Coco dataset [Lin et al., 2014] is used with v Abstract vi transfer learning. Finally, a Siamese Neural Network (SNN) is used to measure the similarity between the objects in each pair of images, which answers (2). A SNN is a pair of identical neural networks that share the same weights and whose outputs are connected to a distance computing function. SNNs are used to identify subtle differences in the feature space of its inputs. Each of the neural networks is given one of the images in the pair as an input and a distance measure is computed on their outputs which allows the inputs to be classified as similar or dissimilar based on a given threshold of the distance metric. The structure of the SNNs applied are inspired by the one proposed in [Dey et al., 2017] and the weights are trained separately on each dataset. The proposed approach is evaluated on two datasets: a set of approximately 25,000 images of 5,000 humpback whale individuals taken a cross a variety of locations, and a set of approximately 13,000 images of 300 individual harbor seals from 3 locations on the west coast of Scotland. The object detection models are evaluated using the intersection over union (IOU) approach while the SNNs are evaluated on F1 score, the harmonic mean of precision and recall. The proposed method is tested on unseen images for each dataset. The SNN models achieved F1 scores of 63.9%, 66.7% and 68.4% for the Humpback whale dataset, the right and the left fins of the bottlenose dolphins respectively. The object detection model achieved an intersection over union of 88.6% and 74.3% for the Humpback whale dataset and the bottlenose dolphin dataset respectively. Lastly the orientation model achieved an F1 score of 94.7%. All quoted results are evaluated on an unseen test dataset.
UCT Libraries logo

Contact us

Jill Claassen

Manager: Scholarly Communication & Publishing

Email: openuct@uct.ac.za

+27 (0)21 650 1263

  • Open Access @ UCT

    • OpenUCT LibGuide
    • Open Access Policy
    • Open Scholarship at UCT
    • OpenUCT FAQs
  • UCT Publishing Platforms

    • UCT Open Access Journals
    • UCT Open Access Monographs
    • UCT Press Open Access Books
    • Zivahub - Open Data UCT
  • Site Usage

    • Cookie settings
    • Privacy policy
    • End User Agreement
    • Send Feedback

DSpace software copyright © 2002-2025 LYRASIS