Browsing by Author "Britz, Stefan S"
Now showing 1 - 4 of 4
Results Per Page
Sort Options
- ItemOpen AccessAdapting Large-Scale Speaker-Independent Automatic Speech Recognition to Dysarthric Speech(2022) Houston, Charles; Britz, Stefan S; Durbach, IanDespite recent improvements in speaker-independent automatic speech recognition (ASR), the performance of large-scale speech recognition systems is still significantly worse on dysarthric speech than on standard speech. Both the inherent noise of dysarthric speech and the lack of large datasets add to the difficulty of solving this problem. This thesis explores different approaches to improving the performance of Deep Learning ASR systems on dysarthric speech. The primary goal was to find out whether a model trained on thousands of hours of standard speech could successfully be fine-tuned to dysarthric speech. Deep Speech – an open-source Deep Learning based speech recognition system developed by Mozilla – was used as the baseline model. The UASpeech dataset, composed of utterances from 15 speakers with cerebral palsy, was used as the source of dysarthric speech. In addition to investigating fine-tuning, layer freezing, data augmentation and re-initialization were also investigated. Data augmentation took the form of time and frequency masking, while layer freezing consisted of fixing the first three feature extraction layers of Deep Speech during fine-tuning. Re-initialization was achieved by randomly initializing the weights of Deep Speech and training from scratch. A separate encoder-decoder recurrent neural network consisting of far fewer parameters was also trained from scratch. The Deep Speech acoustic model obtained a word error rate (WER) of 141.53% on the UASpeech test set of commands, digits, the radio alphabet, common words, and uncommon words. Once fine-tuned to dysarthric speech, a WER of 70.30% was achieved, thus demonstrating the ability of fine-tuning to improve upon the performance of a model initially trained on standard speech. While fine-tuning lead to a substantial improvement in performance, the benefit of data augmentation was far more subtle, improving on the fine-tuned model by a mere 1.31%. Freezing the first three layers of Deep Speech and fine-tuning the remaining layers was slightly detrimental, increasing the WER by 0.89%. Finally, both re-initialization of Deep Speech's weights and the encoder-decoder model generated highly inaccurate predictions. The best performing model was Deep Speech fine-tuned to augmented dysarthric speech, which achieved a WER of 60.72% with the inclusion of a language model.
- ItemOpen AccessApplications of Machine Learning in Apple Crop Yield Prediction(2021) van den Heever, Deirdre; Britz, Stefan SThis study proposes the application of machine learning techniques to predict yield in the apple industry. Crop yield prediction is important because it impacts resource and capacity planning. It is, however, challenging because yield is affected by multiple interrelated factors such as climate conditions and orchard management practices. Machine learning methods have the ability to model complex relationships between input and output features. This study considers the following machine learning methods for apple yield prediction: multiple linear regression, artificial neural networks, random forests and gradient boosting. The models are trained, optimised, and evaluated using both a random and chronological data split, and the out-of-sample results are compared to find the best-suited model. The methodology is based on a literature analysis that aims to provide a holistic view of the field of study by including research in the following domains: smart farming, machine learning, apple crop management and crop yield prediction. The models are built using apple production data and environmental factors, with the modelled yield measured in metric tonnes per hectare. The results show that the random forest model is the best performing model overall with a Root Mean Square Error (RMSE) of 21.52 and 14.14 using the chronological and random data splits respectively. The final machine learning model outperforms simple estimator models showing that a data-driven approach using machine learning methods has the potential to benefit apple growers.
- ItemOpen AccessHigh-resolution virtual try-on with garment extraction using generative adversarial networks(2024) Charters, Daniel J; Britz, Stefan S; Bernicchi, DinoImage-based virtual try-on aims to depict an individual wearing a garment not originally worn by them. While existing literature predominantly focuses on garments from standalone images, this research addresses the use of images where the garment is already being worn by another individual. The study bridges a notable gap as most current systems are tailored for standalone garment images. The proposed system, given a pair of high-resolution images, extracts the garment from one, refines it using context-aware image inpainting, and subsequently transfers it onto the second image's subject. The methodology incorporates various off-the-shelf models, notably Part Grouping Network (PGN), Densepose, and OpenPose for pre-processing. A state-of-the-art context-aware inpainting model refines the garments, and the final synthesis leverages the HR-VITON architecture, producing images at a resolution of 768 × 1024. Distinctively, our model processes both standalone and garment-on-person images. Evaluating the models involves testing on 2 032 high-resolution images under both paired and unpaired conditions. Metrics such as RMSE, Peak Signal-to-Noise Ratio (PSNR), Learned Perceptual Image Patch Similarity (LPIPS), Structural Similarity (SSIM), Inception Score (IS), Fréchet Inception Distance (FID), and Kernel Inception Distance (KID) assessed the model's prowess. Benchmarked against HR-VITON, ACGPN, and CP-VTON, our model slightly trailed HR-VITON but notably surpassed ACGPN and CP-VTON. In realistic, unpaired conditions, the model achieved an IS of 3.152, an FID of 15.3, and a KID of 0.0063. This is compared to an IS of 3.398, an FID of 11.93, and a KID of 0.0034 achieved by HR-VITON on the same data. ACGPN has an FID of 43.29, and a KID of 0.0373, while CP-VTON has an FID of 43.28, while it has a KID of 0.0376. IS is not measured for both ACGPN and CP-VTON. An ablation study underscored the importance of context-aware inpainting in our network. The findings highlight the model's ability to generate convincing, high-resolution virtual try-on images from garment-on-person extractions, addressing a prevalent gap in the literature and offering tangible applications in high-resolution virtual try-on image generation.
- ItemOpen AccessSmall-scale distributed machine learning in R(2022) Taylor, Brenden; Britz, Stefan S; Pienaar, EtienneMachine learning is increasing in popularity, both in applied and theoretical statistical fields. Machine learning models generally require large amounts of data to train and thus are computationally expensive, both in the absolute sense of actual compute time, and in the relative sense of the numerical complexity of the underlying calculations. Particularly for students of machine learning, appropriate computing power can be difficult to come by. Distributed machine learning, which involves sending tasks to a network of attached computers, can offer users access to significantly more computing power than otherwise by leveraging more processors than in a single computer. This research outlines the core concepts of distributed computing and provides brief outlines of the more common approaches to parallel and distributed computing in R, with reference to the specific algorithms and aspects of machine learning that are investigated. One particular parallel backend, doRedis, offers particular advantages as it is easy to set up and implement, and allows for the elastic attaching and detaching of computers from a distributed network. This paper will describe core features of the doRedis package and show, by means of applying certain aspects of the machine learning process, that it is both viable and beneficial to distribute these machine learning aspects. There is the potential for significant time savings when distributing machine learning model training. Particularly for students, the time required for setting up of a distributed network in which to use doRedis is far outweighed by the benefits. The implication that this research aims to explore, is that students will be able to leverage the many computers often available in computer labs to train more complex machine learning models in less time than they would otherwise be able to when using the built-in parallel packages that are already common in R. In fact, certain machine learning packages that already parallelise model training can be distributed to a network of computers, thereby further increasing the gains realised by parallelisation. In this way, more complex machine learning is more accessible. This research outlines the benefits that lie in the distribution of machine learning problems in an accessible, small-scale environment. This small-scale ‘proof of concept' performs well enough to be viable for students, while also creating a bridge, and introducing the knowledge required, to deploy large-scale distribution of machine learning problems.