Browsing by Subject "Statistical Science"
Now showing 1 - 18 of 18
Results Per Page
Sort Options
- ItemOpen AccessAccurate portfolio risk-return structure modelling(2006) Hossain, Nafees; Troskie, Casper G; Guo, RenkuanMarkowitz's modem portfolio theory has played a vital role in investment portfolio management, which is constantly pushing the development on volatility models. Particularly, the stochastic volatility model which reveals the dynamics of conditional volatility. Financial time series and volatility models has become one of the hot spots in operations research. In this thesis, one of the areas we explore is the theoretical formulation of the optimal portfolio selection problem under Ito calculus framework. Particularly, a stochastic variation calculus problem, i.e., seeking the optimal stochastic volatility diffusion family for facilitating the best portfolio selection identified under the continuous-time stochastic optimal control theoretical settings. One of the properties this study examines is the left-shifting role of the GARCH(1, 1) (General Autoregressive Conditional Heteroskedastic) model's efficient frontier. This study considers many instances where the left shifting superior behaviour of the GARCH(1, 1) is observed. One such instance is when GARCH(1, 1) is compared within the volatility modelling extensions of the GARCH environ in a single index framework. This study will demonstrate the persistence of the superiority of the G ARCH ( 1, 1) frontier within a multiple and single index context of modem portfolio theory. Many portfolio optimization models are investigated, particularly the Markowitz model and the Sharpe Multiple and Single index models. Includes bibliographical references (p. 313-323).
- ItemOpen AccessApplication of ANOVA for the analysis of temporal and spatial differences in the length of pelagic goby preyed on by Cape fur seals in the coasts of Namibia(2005) Anday, Tekie T; Underhill, Les; Kirkman, SilviaThe Analysis of variance is a robust technique whereby the total variation present in a set of data is partitioned into two or more components (Wayne, 1999). In this thesis, ANOVA was used to uncover the differences in goby length preyed on by three different colonies of fur seals at the Namibian coast. Moreover, ANOVA was used to investigate temporal differences in lengths of goby preyed on by fur seals in each location of the seal colonies. Results of the analysis are shown in the Analysis and results section, and the findings are discussed in the discussion section. But before these two sections, there are three sections of the thesis. The first section is the general introduction that explains about the general situation and the targets of this thesis. The second section gives a general background on the ANOVA technique. The third section explains the nature of the data and gives background information on gobies.
- ItemOpen AccessA comparative evaluation of data mining classification techniques on medical trauma data(2004) Ramaboa, Kutlwano K K M; Wegner, TrevorThe purpose of this research was to determine the extent to which a selection of data mining classification techniques (specifically, Discriminant Analysis, Decision Trees, and three artifical neural network models - Backpropogation, Probablilistic Neural Networks, and the Radial Basis Function) are able to correctly classify cases into the different categories of an outcome measure from a given set of input variables (i.e. estimate their classification accuracy) on a common database.
- ItemOpen AccessContributions to spatial uncertainty modelling in GIS : small sample data(2007) Guo, Danni; Thiart, ChristienEnvironmental data is very costly and difficult to collect and are often vague (subjective) or imprecise in nature (e.g. hazard level of pollutants are classified as "harmful for human beings"). These realities in practise (fuzziness and small datasets) leads to uncertainty, which is addressed by my research objective: "To model spatial environmental data with .fuzzy uncertainty, and to explore the use of small sample data in spatial modelling predictions, within Geographic Information System (GIS)." The methodologies underlying the theoretical foundations for spatial modelling are examined, such as geostatistics, fuzzy mathematics Grey System Theory, and (V,·) Credibility Measure Theory. Fifteen papers including three journal papers were written in contribution to the developments of spatial fuzzy and grey uncertainty modelling, in which I have a contributed portion of 50 to 65%. The methods and theories have been merged together in these papers, and they are applied to two datasets, PM10 air pollution data and soil dioxin data. The papers can be classified into two broad categories: fuzzy spatial GIS modelling and grey spatial GIS modelling. In fuzzy spatial GIS modelling, the fuzzy uncertainty (Zadeh, 1965) in environmental data is addressed. The thesis developed a fuzzy membership grades kriging approach by converting fuzzy subsets spatial modelling into membership grade spatial modelling. As this method develops, the fuzzy membership grades kriging is put into the foundation of the credibility measure theory, and approached a full data-assimilated membership function in terms of maximum fuzzy entropy principle. The variable modelling method in dealing with fuzzy data is a unique contribution to the fuzzy spatial GIS modelling literature. In grey spatial GIS modelling, spatial predictions using small sample data is addressed. The thesis developed a Grey GIS modelling approach, and two-dimensional order-less spatially observations are converted into two one-dimensional ordered data sequences. The thesis papers also explored foundational problems within the grey differential equation models (Deng, 1985). It is discovered the coupling feature of grey differential equations together with the help of e-similarity measure, generalise the classical GM( 1,1) model into more classes of extended GM( 1,1) models, in order to fully assimilate with sample data information. The development of grey spatial GIS modelling is a creative contribution to handling small sample data.
- ItemOpen AccessEffects of protected areas and climate change on the occupancy dynamics of common bird species in South Africa(2018) Duckworth, Greg; Altwegg, ResProtected areas are tracts of land set aside primarily for the conservation of biodiversity and natural habitats. They are intended to mitigate biodiversity loss caused by land-use change worldwide. Climate change has been shown to disrupt species' natural distributions and patterns, and poses a significant threat to global biodiversity. The goals of this thesis are to address these important issues, and understand how protected areas and climate change affect the range dynamics of common, resident bird species in South Africa. Common species were used because they have been shown to drive important ecosystem patterns, and a decline in abundance and diversity of common species can indicate drastic declines in ecosystem integrity. This thesis comprises four data chapters; in the first three I model the occupancy dynamics of 200 common, resident bird species in South Africa to gain an understanding of how the proportion of protected areas within a landscape affects common species. For the last data chapter, I examined the effects of protected areas and a changing climate on the range dynamics of Cape Rock-jumper (Chaetops frenatus), a species endemic to the southwestern part of South Africa and whose population is declining rapidly in response to climate change. I modelled its occupancy dynamics in relation to climate, vegetation, and protected area. Overall, my key findings show bird abundances vary widely as a function of protected areas, but on average, bird abundances are higher in regions with a higher proportion of protected areas, compared to regions with a lower proportion. I found that the conservation ability of protected areas was influenced by the type of land-use found in the surrounding landscape. For example, the extent of agricultural land in proximity to a protected area significantly increased the mean abundance of birds in that protected area, whilst the average abundance of most species was not affected by the extent of urban area near protected area. On average, species preferentially colonized and persisted within landscapes with a higher proportion of protected area, compared to landscapes with a lower proportion of protected area. However, protected areas were not able to slow the extinction rate for all species, and the average extinction rate for some groups of species actually increased as the extend of protected areas within a landscape increased. Furthermore, Cape Rock-jumper also preferentially occupied regions with higher proportions of protected area. Despite this, Cape Rock-jumper’s range is predicted to shrink considerably in response to a hotter and mildly drier climate forecast for the region. As a result, Cape Rock-jumper will likely be of conservation concern as the climate over its range continues to change. I conclude that, in general, protected areas are effective at conserving common bird species over a heterogeneous landscape in South Africa, and should be prioritised as key conservation strategies in the future. I further conclude that climate change will be a concern to an endemic species, and to biodiversity in general. This will likely place extra stress on the importance of protected areas to mitigate responses of species to climate change.
- ItemOpen AccessETD: Application of CNN-gcForestCS to cassava leaf image classification(2023) Carew, Liam; Britz, StefanCassava is one of the most consumed carbohydrates in the world, providing a reliable source of income and nutrition to inhabitants of Latin America, Africa and Asia. However, its production is greatly affected by pathogenic infection with cassava mosaic disease (CMD) posing the greatest threat to cassava farmers in Africa and Asia. Given that developing nations are estimated to be hit hardest by climate change and projected to have the largest population increases in coming decades, optimisation of cassava yield in these areas is imperative to ensure food security. Traditionally, crop health is determined by manual inspection which can be laborious, error-prone and require technical expertise. This produces a costly barrier of entry for smallholding farmers who make up majority of global cassava production. Development of automated disease detection systems using convolutional neural networks (CNNs) deployable on mobile phones have shown to be a cost-efficient and effective method for cassava monitoring, mainly owing to their advanced feature extraction capabilities. However, CNNs require complex hyperparameter tuning and can be computationally intensive to train. GcForestCS (multi-grained cascade forest with confidence screening) presents an alternative statistical learning method that can be trained using CPU, and requires less complex hyperparameter tuning than deep learning while producing competitive performance for lower-dimensionality datasets. Taking advantage of the feature extraction capabilities of CNNs and the competitive performance of gcForestCS for lower-dimensionality datasets, the central aim of this dissertation was to investigate CNN-gcForestCS as an alternative to deep learning for cassava leaf disease detection. The performance of CNN-gcForestCS was compared to gcForestCS and deep learning where the effect of class balance, CNN feature extraction, CNN feature extractor fine-tuning, pooling after multi-grained scanning, and training set curation were assessed. The results showed that the best DenseNet201-gcForestCS model (86.79%) produced marginally worse performance than the best DenseNet201 model (87.43%), while the best MobileNetV2-gcForestCS model (83.66%) produced marginally better performance than the best MobileNetV2 model (82.87%). Overall, the results indicate that it is inconclusive whether CNN-gcForestCS is a viable alternative to deep learning for cassava leaf disease detection, especially when considering the high computational cost associated with the CNN-gcForestCS methodology.
- ItemOpen AccessFourier method for the measurement of univariate and multivariate volatility in the presence of high frequency data(2007) Malherbe, Chanel; Wilcox, DianeIncludes bibliographical references (leaves 75-77).
- ItemOpen AccessA framework for regime identification and asset allocation(2016) Kondlo, Mpumelelo; Bradfield, DavidThe purpose of this thesis is to examine a regime-based asset allocation strategy and evaluate whether accounting for regime-dependent risk and return of asset classes provides any significant improvement on portfolio performance. The South African market and economy are considered as a proxy for the analysis. Motivation of this thesis stems from the growing body of research by practitioners devoted to models that are reflective of the interdependency between financial assets and the real economy. The asset classes under consideration for the analysis are domestic and foreign cash, domestic and foreign bonds, domestic and foreign equity, inflation linked bonds, property, gold and commodities. In order to evaluate the performance of the regime-based strategy, this thesis proposes a framework based on Principal Component Analysis and Fuzzy Cluster Analysis for regime identification and asset allocation. The performance of the strategy is tested against two strategies that are not cognizant of regime changes. These are an equally weighted portfolio and a buy-and-hold strategy. Furthermore, relative performance analysis was performed by comparing the regime-based strategy proposed in this thesis against the Alexander Forbes Large Manager Watch Index. Due to data limitations, the analysis is done on an in-sample basis without an out-of-sample testing. The results from the analysis showed the extent of outperformance of the proposed regime-based strategy relative to an equally weighted strategy and a buy-and-hold strategy. These results were consistent with existing literature on regime-based strategies. Furthermore, the results provided strong motivation for the use of the regime identification framework together with tactical asset allocation proposed in this thesis.
- ItemOpen AccessImplementing a filtered term structure model in the South African bond market(2007) Ririe, Angela; Dugmore, BrettA key feature of the local bond market is that trade is concentrated in a few liquid government bonds. We review and implement the filtered term structure model proposed by Gombani, Jaschke and Runggaldier that defines an arbitrage free pricing system that is consistent with liquid bond prices. The model is derived in two stages called the underlying and perturbed models. The underlying model defines the theoretical arbitrage free term structure. It is assumed to be a multi-factor, affine HNM type model where the stochastic factors satisfy a linear diffusion equation. Gombani et al. argue that the differences between the theoretical and market prices should be interpreted as unobserved errors. The perturbed model the prices of the observed bonds as their theoretical values distorted by noise. Assuming that the information at any point in time is the market prices of a finite number of liquidly traded bonds, the perturbed model is used to derive a continually updated pricing system that is arbitrage free with respect to the observed prices. The method is based on the Kalman filter. We implement a particular three-factor version of the model and calibrate it to the South African market. We discuss the relevant data and numerical and statistical techniques including principal component analysis and yield curve construction. We apply the formulas for pricing European options on zero-coupon and coupon bearing bonds for Gaussian HJM models to the perturbed model and present two examples to demonstrate the application of the model to bond and option pricing.
- ItemOpen AccessJoint models for nonlinear longitudinal profiles in the presence of informative censoring(2018) Chatora, Tinashe; Little, Francesca; Barnes, KarenMalaria is the parasitic disease which affects the most humans, with Plasmodium falciparum malaria being responsible for the majority of severe malaria and malaria related deaths. The asexual form of the parasite causes the signs and symptoms associated with malaria infection. The sexual form of the parasite, also known as a gametocyte, is the stage responsible for infectivity of the human host (patient) to the mosquito vector, and thus ongoing transmission of malaria and the spread of antimalarial drug resistance. Historically malaria therapeutic efficacy studies have focused mainly on the clearance of asexual parasites. However, malaria in a community can only be truly combated if a treatment program is implemented which is able to clear both asexual and sexual parasites effectively. In this thesis focus will be on the modeling of the key features of gametocytemia. Particular emphasis will be on the modeling of the time to gametocyte emergence, the density of gametocytes and the duration of gametocytemia. It is also of interest to investigate the impact of the administered treatment on the aforementioned features. Gametocyte data has several interesting features. Firstly, the distribution of gametocyte data is zero-inflated with a long tail to the right. The observed longitudinal gametocyte profile also has a nonlinear relationship with time. In addition, since most malaria intervention studies are not designed to optimally measure the evolution of the longitudinal gametocyte profile, there are very few observation points in the time period where the gametocyte profile is expected to peak. Gametocyte data collected from malaria intervention studies are also affected by informative censoring, which leads to incomplete gametocyte profiles. An example of informative censoring is when a patient who experiences treatment failure is “rescued", and withdrawn, from the study in order to receive alternative treatment. This patient can be considered to be in worse health as compared to the patients who remain in this study. There are also competing risks of exit from the study, as a patient can either experience treatment failure or be lost to follow-up. The above mentioned features of gametocyte data make it a statistically appealing dataset to analyze. In literature there are several modeling techniques which can be used to analyze individual features of the data. These techniques include standard survival models for modeling the time to gametocyte emergence and the duration of gametocytemia. The longitudinal nonlinear gametocyte profile would typically be modeled using nonlinear mixed effect models. These nonlinear models could then subsequently be extended to accommodate the zero-inflation in the data, by changing the underlying assumption around the distribution of the response variable. However, it is important to note that these standard techniques do not account for informative censoring. Failure to account for informative censoring leads to bias in parameter estimates. Joint modeling techniques can be used to account for informative censoring. The joint models applied in this thesis combined the longitudinal nonlinear gametocyte densities and the time to censoring due to either lost to follow up or treatment failure. The data analyzed in this thesis were collected from a series of clinical trials conducted be- tween 2002 and 2004 in Mozambique and the Mpumulanga province of South Africa. These trials were a part of the South East African Combination Antimalarial Therapy (SEACAT) evaluation of the phased introduction of combination anti-malarial therapy, nested in the Lubombo Spatial Development Initiative. The aim of these studies was primarily to measure the efficacy of sulfadoxine-pyrimethamine (SP) and a combination of artesunate and sulfadoxine-pyrimethamine (ACT), in eliminating asexual parasites in patients. The patients enrolled in the study had uncomplicated malaria, at a time of increasing resistance to sulfadoxine-pyrimethamine (SP) treatment. Blood samples were taken from patients during the course of 6 weeks on days 0, 1, 2, 3, 7, 14, 21, 28 and 42. Analysis of these blood samples provided longitudinal measurements for asexual 1 parasite densities, gametocyte densities, sulfadoxine drug concentrations and pyrimethamine drug concentrations. The gametocyte data collected in this study was initially analyzed using standard survival modeling techniques. Non-parametric Cox regression models and parametric survival models were applied to the data as part of this initial investigation. These models were used to investigate the factors which affected the time to gametocyte emergence. Subsequently, using the subset of the population which experienced gametocytemia, accelerated failure time models were applied to investigate the factors which affected the duration of gametocytemia. It is evident that the findings from the aforementioned duration investigation would only be able to provide valid duration estimates for patients who were detected to have gametocytemia. This work was extended to allow for population level duration estimates by incorporating the prevalence of gametocytemia into the estimation of duration, for generic patients with specific covariate patterns. The prevalence of gametocytemia was modeled using an underlying binomial distribution. The delta method was subsequently used to derive confidence intervals for the population level duration estimates which were associated with specific covariate patterns. An investigation into the factors affecting the early withdrawal of patients from the study was also conducted. Early exit from the study arose either through loss to follow-up (LTFU) or through treatment failure. The longitudinal gametocyte profile was modeled using joint modeling techniques. The resulting joint model used shared random effects to combine a Weibull survival model, describing the cause- specific hazards of patient exit from the study, with a nonlinear zero-adjusted gamma mixed effect model for the longitudinal gametocyte profile. This model was used to impute the incomplete gametocyte profiles, after adjusting for informative censoring. These imputed profiles were then used to estimate the duration of gametocytemia. It was found, in this thesis, that treatment had a very strong effect on the hazard of gametocyte emergence, density of gametocytes and the duration of gametocytemia. Patients who received a combination of sulfadoxine-pyrimethamine and artesunate were found to have significantly lower hazards of gametocyte emergence, lower predicted durations of gametocytemia and lower predicted longitudinal gametocyte densities as compared to patients who received sulfadoxine-pyrimethamine treatment only.
- ItemOpen AccessModelling growth patterns of bird species using non-linear mixed effects models(2008) Ntirampeba, D; Little, Francesca; Erni, BirgitThe analysis of growth data is important as it allows us to assess how fast things grow and determine various factors that have impact on their growth. In the current study, growth measurements on body features (body mass, wing length, head length, bill (culmen) length, foot length, and tarsus length) for Grey-headed Gulls populating Bonaero Park and Modderfontein Pan in Gauteng province, South Africa, and for Swift Terns on Robben Island were taken. Different methods such as polynomial regressions, non-parametric models and non-linear mixed effects models have been used to fit models to growth data. In recent years, non-linear mixed effects models have become an important tool for growth models. We have fitted univariate inverse exponential, Gompertz, logistic, and Richards non-linear mixed effects models to each of the six body features. We have modeled these six features simultaneously by adding a categorical covariate, which distinguishes between different features, to the model. This approach allows for straightforward comparison of growth between the different body features. In growth studies, the knowledge of the age of each individual is an essential information for growth analysis. For Swift Terns, the exact age of most chicks was unknown, but a small portion of the sample was followed from nestling up to the end of the study period. For chicks with unknown age, we estimated age by fitting the growth curve, obtained from birds with known age, to the mass measurements of the chick with unknown age. It was found that the logistic models were most appropriate to describe the growth of body mass and wing length while the Gompertz models provided best fits for bill, tarsus, head and foot for Grey-headed Gulls. For Swift Terns, the inverse exponential model provided the best univariate fit for four of six features. The logistic model, with a variance function increasing as a power of fitted values, with a different power for each feature and autoregressive correlation structure for within bird errors with errors from different features within the same subject assumed to be independent, gave the best model to describe the growth of all body features taken simultaneously for both Grey-headed Gull and Swift Tern data. It was shown that growth of Grey-headed Gull and Swift Tern chicks occurs in the following order (foot, body mass, tarsus)-(bill, head)-( wing) and (tarsus, foot)-(body mass, bill, head)-(wing) , respectively.
- ItemOpen AccessMultivariate muti-level non-linear mixed-effect models and their application to the modeling of drug-concentration time curves(2011) Mauff, Katya; Little, Francesca; Barnes, KarenThis thesis discusses the techniques involved in the fitting of nonlinear mixed effect (NLME) models. In particular, it looks at the application of these techniques to the analysis of concentration-time data for the aforementioned antimalarial compounds, and details the necessary extensions to the basic modeling process that were required in order to accommodate multiple responses and multiple observation phases (pregnant and postpartum).
- ItemOpen AccessQuantifying abundance, breeding and behaviour of the African black oystercatcher(2006) Parsons, N J; Underhill, LesIncludes bibliographical references (p. 177-190).
- ItemOpen AccessRobben Island penguin pressure model: a decision support tool for an ecosystems approach to fisheries management(2012) Cecchini, Lee-Anne; Scott, Leanne; Stewart, Theodore; Jarre, AstridThe African penguin (Spheniscus demersus) population in southern Africa has declined from approximately 575 000 adults at the start of the 20th century to 180 000 adults in the early 1990s. The population is still declining, leading to the International Union for the Conservation of Nature upgrading the status of African penguins to Endangered on the Red List of Threatened Species. This dissertation uses a systems dynamics approach to produce a model incorporating all important pressures. The model is stochastic and spatially explicit, and uses expert opinion where data are not available. The model has been produced and revised with the help of the Penguin Modelling Group, based at the University of Cape Town. The modelling process culminated in a workshop where participants experimented with the model themselves. The model in this dissertation is only applicable to the penguin population on Robben Island and, as such, conclusions drawn cannot necessarily be applied to other penguin colonies.
- ItemOpen AccessSelecting the best model for predicting a term deposit product take-up in banking(2018) Hlongwane, Rivalani Willie; Rajaratnam, Kanshukan; Huang, Chun-KaiIn this study, we use data mining techniques to build predictive models on data collected by a Portuguese bank through a term savings product campaign conducted between May 2008 and November 2010. This data is imbalanced, given an observed take-up rate of 11.27%. Ling et al. (1998) indicated that predictive models built on imbalanced data tend to yield low sensitivity and high specificity, an indication of low true positive and high true negative rates. Our study confirms this finding. We, therefore, use three sampling techniques, namely, under-sampling, oversampling and Synthetic Minority Over-sampling Technique, to balance the data, this results in three additional datasets to use for modelling. We build the following predictive models: random forest, multivariate adaptive regression splines, neural network and support vector machine on the datasets and we compare the models against each other for their ability to identify customers that are likely to take-up a term savings product. As part of the model building process, we investigate parameter permutations related to each modelling technique to tune the models, we find that this assists in building robust models. We assess our models for predictive performance through the use of the receiver operating characteristic curve, confusion matrix, GINI, kappa, sensitivity, specificity, and lift and gains charts. A multivariate adaptive regression splines model built on over-sampled data is found to be the best model for predicting term savings product takeup.
- ItemOpen AccessStatistical investigation into academic performance in the Faculty of Science at the University of Cape Town in the period 1990-1997(1999) Ronda, Katarzyna; Dunne, Timothy Terence; Troskie, Casper GUltimate academic success at any tertiary institution is affected and partially determined by many factors related to various aspects of individual's life. These factors could be separated into the following distinct categories, namely, educational, biographical, environmental and personal factors. Some of these determinants are used in the admission procedures adopted at tertiary institutions. In South Africa, the results of different final matriculation examinations (referred to as matric or matric exams) written in several educational departments throughout the country are employed to assess the individual's potential to succeed. However, effectiveness of matric results as predictors of successful academic performance has always been controversial. Expressing these concerns and desiring to explore them, the Faculty of Science at the University of Cape Town (UCT) accepted a proposal from the Department of Statistical Sciences to investigate several issues affecting students' performance in the Faculty. The proposal has led to developing this M.Sc. thesis. The major issue of concern in this study is to describe, on a retrospective basis, the extent to which the current selection criteria based on the matric results may have predicted various types of academic performance in the Faculty amongst those selected and admitted. The thesis also exhibits a coherent and fairly complete methodology that is applicable at general or at particular levels of student performance data analysis on a continuing year-to-year basis. The particular statistical methods and techniques in this study have been summarised and discussed in the three Appendices.
- ItemOpen AccessThe Swift Tern Sterna bergii in Southern Africa : growth and movement(2006) Le Roux, Janine; Underhill, Les; Cooper, JohnInlcudes bibliographical references.
- ItemOpen AccessA transdisciplinary study on developing knowledge based software tools for wildlife management in Namibia(2005) Paterson, Barbara; Underhill, Les; Dunn, Tim; Schinzel, BrittaTwo software tools decision making in wildlife management were developed as part of the Transboundary Mammal Project, a joint initiative between the Ministry of Environment and Tourism, Namibia (MET) and the Namibia Nature Foundation (NNF). This project aimed to improve the management of selected rare and high value species in Namibia by building a knowledge base for better informed decision making. The knowledge base was required to encapsulate current knowledge and experience of conservation experts and specialists. To provide an electronic representation of this knowledge base a hypermedia Information System for Rare Species Management (known as IRAS) was designed and implemented. The research therefore explores the disciplinary interstices of information technology, conservation and ethics, against the cultural background of a post-colonial society in which the deficits of the past constrain the impact and the efficacy of technological interventions.