Browsing by Department "Department of Statistical Sciences"
Now showing 1 - 20 of 343
Results Per Page
Sort Options
- ItemOpen AccessA centile chart for birth weight for an urban population of the Western Cape(1995) Theron, G B; Thompson, M LEvidence from large epidemiological studies has supported concern that being born light for gestational age (LiGA) may be detrimental. The incidence of LiGA babies is an important indicator of the health of women of reproductive age in deprived communities. In the assessment of LiGA in the Western Cape, centile charts constructed for populations in other parts of the world are generally used. These charts, however, may not be appropriate. Patients residing in the area served by the Tygerberg Hospital obstetric service, who booked early with singleton pregnancies, had their gestational age confirmed by early ultrasound and delivered between 1 March 1989 and 28 February 1990 were included in the study. The sample consisted of 3 643 patients. The mean birth weight was 2 995 g (SD 573 g) and the range 760 - 5 080 g. The distribution of birth weight at each week of gestation from 28 to 42 weeks was not normal. The 4-parameter Johnson family of densities was used to model the distribution of birth weight at each gestational age. A comparison of the distribution of birth weight in the study relative to the perinatal growth chart for international reference constructed by Dunn was also made. In addition to considering an overall chart, the sample was subdivided according to a number of characteristics (e.g. gender, firstborn and latter-born babies, smoking habit, hypertensive disorders and induction of labour) in order to explore their impact on the distribution of birth weight. Having explored the potential impact of all these factors, it was concluded that a single chart including all patients could be constructed.
- ItemOpen AccessA comparative study of stochastic models in biology(1997) Brandão, Anabela de Gusmão; Zucchini, Walter; Underhill, Les
- ItemOpen AccessA comparative study of stochastic models in biology(1997) Brandão, Anabela de Gusmão; Zucchini, Walter; Underhill, LesIn many instances, problems that arise in biology do not fall under any category for which standard statistical techniques are available to be able to analyse them. Under these situations, specifics methods have to be developed to solve and answer questions put forward by biologists. In this thesis four different problems occurring in biology are investigated. A stochastic model is built in each case which describes the problem at hand. These models are not only effective as a description tool but also afford strategies consistent with conventional model selection processes to deal with the standard statistical hypothesis testing situations. The abstracts of the papers resulting from these problems are presented below.
- ItemOpen AccessA GLMM analysis of data from the Sinovuyo Caring Families Program (SCFP)(2018) Nhapi, Raymond T; Little, Francesca; Kassanjee, RWe present an analysis of the data from a longitudinal randomized control trial that assesses the impact of an intervention program aimed at improving the quality of childcare within families. The SCFP was a group-based program implemented over two separate waves conducted in Khayelitsha and Nyanga. The data were collected at baseline, post-test and at one-year follow-up via questionnaires (self-assessment) and observational video coding. Multiple imputation (using chained equations) procedures were used to impute missing information. Generalized linear Mixed Effect Models (GLMMs) were used to assess the impact of the intervention program on the responses, adjusted for possible confounding variables. These summed scores were often right skewed with zero-inflation. All the effects (fixed and random) were estimated through the method of maximum likelihood. Primarily, an intention-to-treat analysis was done after which a per-protocol analysis was also implemented with participants who attended a specified number of the group sessions. All these GLMMs were implemented in the imputation framework.
- ItemOpen AccessA Machine Learning Model for Octane Number Prediction(2023) Spencer, Victor; Moller, Klaus; Nyirenda Juwa ChizaAssessing the quality of gasoline blends in blending circuits is an important task in quality control. Gasoline quality however , cannot be measured directly on a process stream. Therefore a quality indicator which can be determined from the stream composition is required. Various quality indicators have been used in the existing body of literature but the indicator in this study will be the Research Octane Number (RON). This is an indicator which measures the ignition of gasoline relative to pure octane (Abdul-Gani et al. 2018). Previous research has used empirical models in the form of phenomeno-logical and machine learning models (Gonz´alez 2019). Phenomeno-logical models have been used in the past as a way of programming an engineer's thought process in the form of differential equations put together. Machine learning models are data driven with primarily regression and deep learning methods being used in literature as prediction models. This study aims to develop a parsimonious machine learning model which can be used to predict the RON from the molar composition of the gasoline product stream. Regression, ensemble learning and Artificial Neural Networks (ANN) will be used specifically in this study. The ensemble learning models which will be trained are Bayesian Additive Regression Trees (BART) and Gradient Boosting Machines (GBM). The raw data will be scraped from multiple journals online and the data frame will be comprised of volume compositions of the reference compounds and the RON of each blend. The existing data frame will be extended to include the molar composition of the structural groups present in each of the blends. The structural groups which may be referred to as functional groups are specific substituents within molecules which may be responsible for the characteristic chemical reactions of the respective molecules. This addition of structural groups adds a layer of information to differentiate between blends with different compound compositions but similar RON. It was hypothesised that the molar compositions of the additives and their substituent structural groups would rank highest and the molar composition of n-heptane would have the lowest ranking. For the Multiple Linear Regression (MLR) models, two cases were trained; one with interaction parameters and another without. Both of these cases were trained with and without the composition constraints on the compound compositions. For the ensemble learning case, a BART model with 200 trees and a GBM model with 1998 trees were trained. Four Single Layer Feed-forward Neural Network (SLFN) models were trained, each with 3, 5, 10 and 15 nodes. The choice of neural network architecture was made because the data frame was small, with only 12 input variables and 350 observations. Prior to training the models, an Explanatory Data Analysis was carried out to assess the potential dimensionality reduction, correlations and outliers. The final regression model was the interaction model with a test MSE of 7.54 and an adjusted R2 of 0.986. The BART model obtained a test MSE of 13.74 and an adjusted R2 of 0.983. The GBM model had a test MSE of 38.12 and an adjusted R2 of 0.917. Lastly the best performing ANN was the 10 node SLFN which obtained a test MSE of 11.26 and an adjusted R2 of 0.969. For each model, a variable importance was carried out and it was observed that the molar composition of n-Heptane consistently ranked high in the variable importance. In addition to these predictive statistics; the parity plots, residual plots and Analysis of Variance (ANOVA) were analysed and taken into consideration in evaluating the performance of each of the models trained. It was concluded that the MLR model performed best followed by the BART model. The ANN models ranked third and the GBM model ranked last. The hypothesis that the molar compositions of the additives and their substituent structural groups would rank highest and iv n-heptane would be the lowest ranking component was disproved as the molar composition of n-heptane and its substituent structural groups consistently ranked high . The recommendation for this study is to train the models with a more representative data set in future and to use a hybrid model which comprises of a phenomeno-logical model and a machine learning model for best results and to reduce the bias of the model in the regions with few data points. With the next step of the study being the integration of the new model into the plant-wide Advanced Process Control (APC).
- ItemOpen AccessA multivariate statistical approach to the assessment of nutrition status(1972) Fellingham, Stephen A; Troskie, Casper GAttention is drawn to the confusion which surrounds the concept of nutrition status and the problem of selecting an optimum subset of variables by which nutrition status can best be assessed is defined. Using a multidisciplinary data set of some 60 variables observed on 1898 school children from four racial groups, the study aims to identify statistically, both those variables which are unrelated to nutrition status and also those which, although related, are so highly correlated that the measurement of all would be an unnecessary extravagance. It is found that, while the somatometric variables provide a reasonably good (but non-specific) estimate of nutrition status, the disciplines form meaningful groups and the variables of the various disciplines tend to supplement rather than replicate each other. Certain variables from most of the disciplines are, therefore, necessary for an optimum and specific estimate of nutrition status. Both the potential and the shortcomings of a number of statistical techniques are demonstrated.
- ItemOpen AccessA note on the statistical analysis of point judgment matrices(2013) Kabera, Gaetan; Haines, L MThe Analytic Hierarchy Process is a multicriteria decision making technique developed by Saaty in the 1970s. The core of the approach is the pairwise comparison of objects according to a single criterion using a 9-point ratio scale and the estimation of weights associated with these objects based on the resultant judgment matrix. In the present paper some statistical approaches to extracting the weights of objects from a judgment matrix are reviewed and new ideas which are rooted in the traditional method of paired comparisons are introduced.
- ItemOpen AccessA reproducible approach to equity backtesting(2019) Arbi, Riaz; Gebbie, TimothyResearch findings relating to anomalous equity returns should ideally be repeatable by others. Usually, only a small subset of the decisions made in a particular backtest workflow are released, which limits reproducability. Data collection and cleaning, parameter setting, algorithm development and report generation are often done with manual point-and-click tools which do not log user actions. This problem is compounded by the fact that the trial-and-error approach of researchers increases the probability of backtest overfitting. Borrowing practices from the reproducible research community, we introduce a set of scripts that completely automate a portfolio-based, event-driven backtest. Based on free, open source tools, these scripts can completely capture the decisions made by a researcher, resulting in a distributable code package that allows easy reproduction of results.
- ItemOpen AccessA rescheduling heuristic for the single machine total tardiness problem(2006) Nyirenda, J CIn this paper, we propose a rescheduling heuristic for scheduling N jobs on a single machine in order to minimise total tardiness. The heuristic is of the interchange type and constructs a schedule from the modified due date (MDD) schedule. Unlike most interchange heuristics that consider interchanges involving only two jobs at a time, the newly proposed heuristic uses interchanges that may involve more than two jobs at any one time. Experimental results show that the heuristic is effective at reducing total tardiness producing schedules that either similar or better than those produced by the MDD alone. Furthermore, when applied to some test problems the heuristic found optimal schedules to all of them.
- ItemOpen AccessA Sensitivity Analysis of Model Structure in Stochastic Differential Equation and Agent-Based Epidemiological Models(2014) Combrink, JamesThe dynamics of infectious diseases have been modelled by several universally recognised procedures. The most common two modelling methods are differential equation models (DEM) and agent based models (ABM). These models have both been used through the late 20th and early 21st century to gain an understanding of prevalence levels and behaviour of infectious diseases; and subsequently to forecast potential impacts of a treatment. In the case of a life-threatening disease such as Malaria, it is problematic to be working with incorrect predictions and an epidemic may result from a misinformed judgement on the required treatment program. DEM and ABM have been documented to provide juxtapositioned results (and conclusions) in several cases, even whilst fitting identical data sets [Figueredo, et al. 2014]. Under the correct model, one would expect a fair representation of an infectious disease and hence an insightful conclusion. It is hence detrimental for the choice of treatment tactics to be dependent on the choice of model structure. This honours thesis has identified the necessity for caution on the model methodology and performs a sensitivity analysis on the incidence and prevalence of an infectious disease under varying levels of treatment. This thesis hones in on modelling methodology under various structures: the procedure is applicable to any infectious disease, and this thesis provides a case study on Malaria modelling with a later extension into Ebola. Beginning with a simple Susceptible-Infected-Recovered-Susceptible (SIRS) model: immediately obvious differences are examined to give an indication of the point at which the models lose integrity in direct comparability. The SIRS models are built up to include varying levels of exposure, treatment and movement dynamics and examining the nature of the differences in conclusions drawn from separate models.
- ItemOpen AccessA statistical approach to automated detection of multi-component radio sources(2020) Smith, Jeremy Stewart; Taylor, RussellAdvances in radio astronomy are allowing for deeper and wider areas of the sky to be observed than ever before. Source counts of future radio surveys are expected to number in the tens of millions. Source finding techniques are used to identify sources in a radio image, however, these techniques identify single distinct sources and are challenged to identify multi-component sources, that is to say, where two or more distinct sources belong to the same underlying physical phenomenon, such as a radio galaxy. Identification of such phenomena is an important step in generating catalogues from surveys on which much of the radio astronomy science is based. Historically, identifying multi-component sources was conducted by visual inspection, however, the size of future surveys makes manual identification prohibitive. An algorithm to automate this process using statistical techniques is proposed. The algorithm is demonstrated on two radio images. The output of the algorithm is a catalogue where nearest neighbour source pairs are assigned a probability score of being a component of the same physical object. By applying several selection criteria, pairs of sources which are likely to be multi-component sources can be determined. Radio image cutouts are then generated from this selection and may be used as input into radio source classification techniques. Successful identification of multi-component sources using this method is demonstrated.
- ItemOpen AccessA temporal prognostic model based on dynamic Bayesian networks: mining medical insurance data(2021) Mbaka, Sarah Kerubo; Ngwenya, MzabalazoA prognostic model is a formal combination of multiple predictors from which risk probability of a specific diagnosis can be modelled for patients. Prognostic models have become essential instruments in medicine. The models are used for prediction purposes of guiding doctors to make a smart diagnosis, patient-specific decisions or help in planning the utilization of resources for patient groups who have similar prognostic paths. Dynamic Bayesian networks theoretically provide a very expressive and flexible model to solve temporal problems in medicine. However, this involves various challenges due both to the nature of the clinical domain, and the nature of the DBN modelling and inference process itself. The challenges from the clinical domain include insufficient knowledge of temporal interactions of processes in the medical literature, the sparse nature and variability of medical data collection, and the difficulty in preparing and abstracting clinical data in a suitable format without losing valuable information in the process. Challenges about the DBN methodology and implementation include the lack of tools that allow easy modelling of temporal processes. Overcoming this challenge will help to solve various clinical temporal reasoning problems. In this thesis, we addressed these challenges while building a temporal network with explanations of the effects of predisposing factors, such as age and gender, and the progression information of all diagnoses using claims data from an insurance company in Kenya. We showed that our network could differentiate the possible probability exposure to a diagnosis given the age and gender and possible paths given a patient's history. We also presented evidence that the more patient history is provided, the better the prediction of future diagnosis.
- ItemOpen AccessA theory and process evaluation of the Learner Engagement Programme (LEP) implemented by Just Grace NPC(2022) Kwenda, Geraldine; Boodhoo, AdiilahBackground The Learner Engagement Programme (LEP) is an after school dropout prevention programme that operates in Langa, a township located in Cape Town, South Africa. Langa is an impoverished township characterised by socio-economic challenges, such as high unemployment, violence, few economic opportunities and poor school infrastructure. The LEP aims to address learner disengagement of at-risk high school learners. It operates within the only five high schools in Langa: Langa High, Khulani High, Isimela High, Zimasa High, and Ikamva High. The programme is implemented by Just Grace, a non-profit organisation whose goal is to uplift the youth and community of Langa through educational, community and youth development programmes. Just Grace is funded by Trusts and Companies such as DGMT, Mineral Loy (Pty) Ltd, Swiss Philanthropic Foundation, Capfin (Pty) Ltd, Enigma Electrical (Pty) Ltd, Lot Emphangeni (Pty) Ltd, Mergon Foundation, Dairycap CC. Aims of the evaluation The evaluation aimed to determine: (a) the extent to which the programme design can realistically bring about the desired outcomes and (b) the extent to which the programme's planned activities are implemented with fidelity. A programme theory evaluation and process evaluation were carried out to address the following evaluation questions: Programme theory evaluation questions 1) What is the theory and logic underlying the LEP? 2) Is the programme theory and logic plausible? Process evaluation questions 1) Is the programme consistently servicing the planned target population? a. To what extent are learners appropriately identified as at risk? b. Which support services are used the most by learners? c. Are the programme services relevant to meet the learners' needs? 2) Are initial home visits being delivered according to planned programme procedures? 3) Are the programme staff adequately trained and equipped to work with at-risk learners and implement the programme's different components? Methodology The choice of methods for this evaluation was informed by the evaluation questions as well as by the practical opportunities and constraints associated with Level 3 lockdown in South Africa in response to the COVID-19 pandemic. Access to programme beneficiaries was restricted during this time and the risks and concerns associated with face- to- face contact compelled the evaluator to capitalise on available secondary data sources and data gathered from programme staff (through a focus group) to address the process evaluation questions The programme theory evaluation was guided by Donaldson's (2007) systematic five-step framework in conjunction with Brouselle and Champagnes (2011) steps for a logic analysis. An initial LEP programme theory was developed using data obtained through a structured engagement with a purposive sample of four programme staff and a review of relevant programme documents. The plausibility of the programme theory was then examined in line with the best practice literature. Brousselle and Champagne's (2011) steps for a logic analysis were applied to guide process, which culminated into a reconstructed programme theory. To answer the process evaluation questions 1-3, programme documents were systematically analysed. A focus group, which gathered programme staff's experiences of and insights into the current programme infrastructure, challenges, and organisational support, was also conducted. The focus group data was analysed using Krueger's (1994) framework for thematic analysis. Key Findings The programme theory evaluation confirmed that the LEP initial programme theory and logic was plausible: the programme does incorporate a multi-level approach to tackling learner disengagement, targeting the individual, family and community. It addresses the psychosocial aspects that contribute to learner disengagement through the provision of one on one counselling, a life skills programme and parental support groups and training. The programme also has elements of an effective after school programme with qualified staff, adequate resources and efficient programme practices. A few shortcomings were identified through the evaluation: the programme lacks academic support, early warning systems, specialised external partners, and behavioural outcome measures, which are crucial in preventing school dropout. While the evaluator cannot conclusively determine whether the initial home visits are being delivered according to planned procedures (given the limitation of the data at hand), the process evaluation confirmed that the programme the criteria used to identify at risk learners are in line with the best practice literature. The process evaluation also revealed factors that compromised the effective implementation of the programme, including lack of commitment from partner schools, lack of trust in programme methods from parents/caregivers and a lack of staff safety when conducting home visits. Recommendations Key recommendations discussed in this evaluation include the following: • Development of an early warning system in tandem with partner schools as data on atrisk learners needs to be collected earlier in their school career and consistently to ensure the learner receives the necessary assistance timeously and suitable interventions are developed. • Provision of an academic component as learners who are provided with academic support in addition to psychosocial support have a higher chance of school completion. • Development of a behavioural monitoring system as effective programmes utilise behavioural outcome measures to assess programme effects on learners' behaviours • Forging partnerships with external agencies to assist the programme in specialised areas as the programme would benefit by being embedded in a broader network of community-based organisations, NGOs, civil organisations, and government agencies trained to provide specialised support and assistance to their beneficiaries.
- ItemOpen AccessAccurate portfolio risk-return structure modelling(2006) Hossain, Nafees; Troskie, Casper G; Guo, RenkuanMarkowitz's modem portfolio theory has played a vital role in investment portfolio management, which is constantly pushing the development on volatility models. Particularly, the stochastic volatility model which reveals the dynamics of conditional volatility. Financial time series and volatility models has become one of the hot spots in operations research. In this thesis, one of the areas we explore is the theoretical formulation of the optimal portfolio selection problem under Ito calculus framework. Particularly, a stochastic variation calculus problem, i.e., seeking the optimal stochastic volatility diffusion family for facilitating the best portfolio selection identified under the continuous-time stochastic optimal control theoretical settings. One of the properties this study examines is the left-shifting role of the GARCH(1, 1) (General Autoregressive Conditional Heteroskedastic) model's efficient frontier. This study considers many instances where the left shifting superior behaviour of the GARCH(1, 1) is observed. One such instance is when GARCH(1, 1) is compared within the volatility modelling extensions of the GARCH environ in a single index framework. This study will demonstrate the persistence of the superiority of the G ARCH ( 1, 1) frontier within a multiple and single index context of modem portfolio theory. Many portfolio optimization models are investigated, particularly the Markowitz model and the Sharpe Multiple and Single index models. Includes bibliographical references (p. 313-323).
- ItemOpen AccessAdapting Large-Scale Speaker-Independent Automatic Speech Recognition to Dysarthric Speech(2022) Houston, Charles; Britz, Stefan S; Durbach, IanDespite recent improvements in speaker-independent automatic speech recognition (ASR), the performance of large-scale speech recognition systems is still significantly worse on dysarthric speech than on standard speech. Both the inherent noise of dysarthric speech and the lack of large datasets add to the difficulty of solving this problem. This thesis explores different approaches to improving the performance of Deep Learning ASR systems on dysarthric speech. The primary goal was to find out whether a model trained on thousands of hours of standard speech could successfully be fine-tuned to dysarthric speech. Deep Speech – an open-source Deep Learning based speech recognition system developed by Mozilla – was used as the baseline model. The UASpeech dataset, composed of utterances from 15 speakers with cerebral palsy, was used as the source of dysarthric speech. In addition to investigating fine-tuning, layer freezing, data augmentation and re-initialization were also investigated. Data augmentation took the form of time and frequency masking, while layer freezing consisted of fixing the first three feature extraction layers of Deep Speech during fine-tuning. Re-initialization was achieved by randomly initializing the weights of Deep Speech and training from scratch. A separate encoder-decoder recurrent neural network consisting of far fewer parameters was also trained from scratch. The Deep Speech acoustic model obtained a word error rate (WER) of 141.53% on the UASpeech test set of commands, digits, the radio alphabet, common words, and uncommon words. Once fine-tuned to dysarthric speech, a WER of 70.30% was achieved, thus demonstrating the ability of fine-tuning to improve upon the performance of a model initially trained on standard speech. While fine-tuning lead to a substantial improvement in performance, the benefit of data augmentation was far more subtle, improving on the fine-tuned model by a mere 1.31%. Freezing the first three layers of Deep Speech and fine-tuning the remaining layers was slightly detrimental, increasing the WER by 0.89%. Finally, both re-initialization of Deep Speech's weights and the encoder-decoder model generated highly inaccurate predictions. The best performing model was Deep Speech fine-tuned to augmented dysarthric speech, which achieved a WER of 60.72% with the inclusion of a language model.
- ItemOpen AccessThe address sort and other computer sorting techniques(1971) Underhill, Leslie G; Troskie, Casper GOriginally this project was to have been a feasibility study of the use of computers in the library. It soon became clear that the logical place in the library at which to start making use of the computer was the catalogue. Once the catalogue was in machine-readable form it would be possible to work backwards to the book ordering and acquisitions system and forwards to the circulation and book issue system. One of the big advantages in using the computer to produce the catalogue would be the elimination of the "skilled drudgery" of filing. Thus vast quantities of data would need to be sorted. And thus the scope of this project was narrowed down from a general feasibility study, firstly to a study of a particular section of the library and secondly to one particularly important aspect of that section - that of sorting with the aid of the computer. I have examined many, but by no means all computer sorting techniques, programmed them in FORTRAN as efficiently as I was able, and compared their performances on the IBM 1130 computer of the University of Cape Town. I have confined myself to internal sorts, i.e. sorts that take place in core. This thesis stops short of applying the best of these techniques to the library. I intend however to do so, and to work back to the original scope of my thesis.
- ItemOpen AccessAgent-based model of the market penetration of a new product(2014) Magadla, Thandulwazi; Durbach, Ian; Scott, LeanneThis dissertation presents an agent-based model that is used to investigate the market penetration of a new product within a competitive market. The market consists of consumers that belong to social network that serves as a substrate over which consumers exchange positive and negative word-of-mouth communication about the products that they use. Market dynamics are influenced by factors such as product quality; the level of satisfaction that consumers derive from using the products in the market; switching constraints that make it difficult for consumers to switch between products; the word-of-mouth that consumers exchange and the structure of the social network that consumers belong to. Various scenarios are simulated in order to investigate the effect of these factors on the market penetration of a new product. The simulation results suggest that: â– A new product reaches fewer new consumers and acquires a lower market share when consumers switch less frequently between products. â– A new product reaches more new consumers and acquires a higher market share when it is of a better quality to that of the existing products because more positive word-of-mouth is disseminated about it. â– When there are products that have switching constraints in the market, launching a new product with switching constraints results in a higher market share compared to when it is launched without switching constraints. However, it reaches fewer new consumers because switching constraints result in negative word-of-mouth being disseminated about it which deters other consumers from using it. Some factors such as the fussiness of consumers; the shape and size of consumers' social networks; the type of messages that consumers transmit and with whom and how often they communicate about a product, may be beyond the control of marketing managers. However, these factors can potentially be influenced through a marketing strategy that encourages consumers to exchange positive word-of-mouth both with consumers that are familiar with a product and those who are not.
- ItemOpen AccessAiding Decision making for foodbank Cape Town(2010) Blake, Timothy James; Stewart, Theodor J; Van Dyk, Esbeth
- ItemOpen AccessAn alternative model for multivariate stable distributions(2009) Jama, Siphamandla; Guo, RenkuanAs the title, "An Alternative Model for Multivariate Stable Distributions", depicts, this thesis draws from the methodology of [J36] and derives an alternative to the sub-Gaussian alpha-stable distribution as another model for multivariate stable data without using the spectral measure as a dependence structure. From our investigation, firstly, we echo that the assumption of "Gaussianity" must be rejected, as a model for, particularly, high frequency financial data based on evidence from the Johannesburg Stock Exchange (JSE). Secondly, the introduced technique adequately models bivariate return data far better than the Gaussian model. We argue that unlike the sub-Gaussian stable and the model involving a spectral measure this technique is not subject to estimation of a joint index of stability, as such it may remain a superior alternative in empirical stable distribution theory. Thirdly, we confirm that the Gaussian Value-at-Risk and Conditional Value-at-Risk measures are more optimistic and misleading while their stable counterparts are more informative and reasonable. Fourthly, our results confirm that stable distributions are more appropriate for portfolio optimization than the Gaussian framework.
- ItemOpen AccessAn analysis of household water consumption in the City of Cape Town using a panel data set (2016-2020)(2022) Kaplan, Anna Leah; Er, Sebnem; Visser, MartineUnderstanding consumer behaviour with respect to water consumption has become an active field of study. This thesis uses a household billing dataset that tracks the quantity of water consumed by households in the City of Cape Town (CoCT) from 2016 to 2020. The household billing data was filtered to include only household observations and then aggregated to the ward level. As a result, the aggregated data is a balanced spatial panel dataset including 20 quarterly observations for each of the 88 wards. Using the billing data set, multiple linear regression models, panel data models as well as spatial panel models were implemented to predict ward level water consumption. Using several visualisations and statistical measures, this thesis found that consumption dropped significantly during the drought period (2016-2018) and also found spatial clusters of water consumption in the CoCT. The data showed that before and after the drought, water consumption exhibited a seasonal pattern which was absent during the drought period. It is also noted that although consumption levels after the drought increase, they do not rise as high as pre-drought levels. The linear models implemented in this thesis resulted in an Adjusted R-squared values of up to 0.85, implying that the independent variables used in the models explain a large amount of variation observed in the dependent variable, quantity of ward level water consumption.