Browsing by Author "Nyirenda, Juwa Chiza"
Now showing 1 - 4 of 4
Results Per Page
Sort Options
- ItemOpen AccessExploring the application of Word2Vec to basket transaction data in the grocery retail industry(2022) De Swardt, Gideon Jacobus; Nyirenda, Juwa ChizaIn this thesis, we explore the application of Word2vec to basket transaction data provided by a large grocery retailer in South Africa. Word2vec is an algorithm based on representation learning. The objective of the exploration is to establish whether the application of Word2vec to basket transaction data would generate product embeddings that represent a useful relationship between products. Furthermore, we compareWord2vec's outputs and performance to traditional methods for studying product relationships which include Association Rules Mining (ARM) and Recommendation Systems. The results from the experiments showed that indeed product embeddings created by Word2vec on transaction data are meaningful and useful. It was clear that the idea of using transactions in the place of sentences to the neural network, provides analogous results to that of a natural language task. Word2vec clearly demonstrated its ability to cluster products that are homogeneous or fulfill similar needs. Furthermore this sort of product relationship was not provided by any other traditional methods, which was clear when comparing the outputs to that of ARM and Recommendation Systems. We also show that usingWord2vec could potentially provide insight on truly complementary products that ARM perhaps fails to do. Word2vec also proved to be incredibly scalable, taking input data of 20 times the size of what traditional methods could handle on a local computer. We end with a description of a potential application of the ideas learnt during the course of this study, with a real business problem, that we believe could lead to an enhanced customer shopping experience and in turn increase revenue and profits for the retailer.
- ItemOpen AccessGenerating new data points using singular value decomposition(2025) Biyana, Tlhologello; Nyirenda, Juwa ChizaThis study presents an innovative solution to the challenge of generating new data points for small data sets. It introduces a Single Value Decomposition (SVD)-based model that draws inspiration from the ability of SVD to estimate a lower rank matrix. This approach seeks to overcome the limitations imposed by sample size constraints by expanding available data. Motivated by challenges faced during algorithm development due to small data sets, the study proposes the SVD-based model, evaluates its efficacy in replicating original data attributes and compares model performance with new and original data. The method involves utilising SVD to generate new data, mimicking a predictive modelling formula by combining systematic and error components. The generated data set retains the distribution of the original data but introduces distinct error values, facilitating efficient data generation. Through graphical and quantitative assessments, including histograms, box plots, correlation analysis and reconstruction error evaluations, the effectiveness of the method is demonstrated. The study focuses on comparing SVD-generated data sets with original data across three data sets: Abalone, Life Expectancy and NBA. Findings indicate close approximation of distribution, correlation and model performance attributes between SVD-generated and original data sets. Improved similarity with increasing observation count enhances comparability and model performance of SVD-generated data. While minor deviations are noted in specific scenarios, the study underscores potential of SVD in generating new data points from the original data sets, making it a valuable tool for data augmentation and analysis across diverse data sets.
- ItemOpen AccessLong short-term memory neural networks for predicting corporate credit ratings(2024) Chandoo, Ali Aonali; Nyirenda, Juwa ChizaCredit ratings are an important tool when assessing financial instruments and investments. The existing literature shows that long short-term memory (LSTM) neural networks are the best neural network to predict credit ratings, while random forests have been shown to perform better than regular neural networks. As at the beginning of this study, no study had compared the performance of LSTM and random forests despite their reported superior performance. This study compares the performance of random forests and LSTM neural networks in predicting corporate credit ratings in the USA using Standard and Poor's data. The study finds that while LSTM neural networks pose serious competition, random forests have a slight edge over LSTM neural networks, showing that it is still worth using older and simpler techniques in predicting credit ratings.
- ItemOpen AccessPredicting residential demand: applying random forest to predict housing demand in Cape Town(2018) Dyer, Ross; McGaffin, Robert; Nyirenda, Juwa ChizaThe literature shows that Random Forest is a suitable technique to predict a target variable for a household with completely unseen characteristics. The models produced in this paper show that the characteristics of a household can be used to predict the Type of Dwelling, the Tenure and the Number of Bedrooms to varying degrees of accuracy. While none of the sets of models produced indicate a high degree of predictive accuracy relative to hurdle rates, the paper does demonstrate the value that the Random Forest technique offers in moving closer to an understanding of the complex nature of housing demand. A key finding is that the Census variables available for the models are not discriminatory enough to enable the high degree of accuracy expected from a predictive model.