ETD: Application of CNN-gcForestCS to cassava leaf image classification

Thesis / Dissertation


Permanent link to this Item
Journal Title
Link to Journal
Journal ISSN
Volume Title
Cassava is one of the most consumed carbohydrates in the world, providing a reliable source of income and nutrition to inhabitants of Latin America, Africa and Asia. However, its production is greatly affected by pathogenic infection with cassava mosaic disease (CMD) posing the greatest threat to cassava farmers in Africa and Asia. Given that developing nations are estimated to be hit hardest by climate change and projected to have the largest population increases in coming decades, optimisation of cassava yield in these areas is imperative to ensure food security. Traditionally, crop health is determined by manual inspection which can be laborious, error-prone and require technical expertise. This produces a costly barrier of entry for smallholding farmers who make up majority of global cassava production. Development of automated disease detection systems using convolutional neural networks (CNNs) deployable on mobile phones have shown to be a cost-efficient and effective method for cassava monitoring, mainly owing to their advanced feature extraction capabilities. However, CNNs require complex hyperparameter tuning and can be computationally intensive to train. GcForestCS (multi-grained cascade forest with confidence screening) presents an alternative statistical learning method that can be trained using CPU, and requires less complex hyperparameter tuning than deep learning while producing competitive performance for lower-dimensionality datasets. Taking advantage of the feature extraction capabilities of CNNs and the competitive performance of gcForestCS for lower-dimensionality datasets, the central aim of this dissertation was to investigate CNN-gcForestCS as an alternative to deep learning for cassava leaf disease detection. The performance of CNN-gcForestCS was compared to gcForestCS and deep learning where the effect of class balance, CNN feature extraction, CNN feature extractor fine-tuning, pooling after multi-grained scanning, and training set curation were assessed. The results showed that the best DenseNet201-gcForestCS model (86.79%) produced marginally worse performance than the best DenseNet201 model (87.43%), while the best MobileNetV2-gcForestCS model (83.66%) produced marginally better performance than the best MobileNetV2 model (82.87%). Overall, the results indicate that it is inconclusive whether CNN-gcForestCS is a viable alternative to deep learning for cassava leaf disease detection, especially when considering the high computational cost associated with the CNN-gcForestCS methodology.