Automated detection and classification of red roman in unconstrained underwater environments using Mask R-CNN

Master Thesis

2021

Permanent link to this Item
Authors
Journal Title
Link to Journal
Journal ISSN
Volume Title
Publisher
Publisher
License
Series
Abstract
The availability of relatively cheap, high-resolution digital cameras has led to an exponential increase in the capture of natural environments and their inhabitants. Videobased surveys are particularly useful in the underwater domain where observation by humans can be expensive, dangerous, inaccessible, or destructive to the natural environment. Moreover, video-based surveys offer an unedited record of biodiversity at a given point in time – one that is not reliant on human recall or susceptible to observer bias. In addition, secondary data that is useful in scientific study (date, time, location, etc.) are by default stored in almost all digital formats as metadata. When analysed effectively, this growing body of digital data offers the opportunity for robust and independently reproducible scientific study of marine biodiversity (and how this might change over time, for example). However, the manual review of image and video data by humans is slow, expensive, and not scalable. A large majority of marine data has never gone through analysis by human experts. This necessitates computer-based (or automated) methods of analysis that can be deployed at a fraction of the time and cost, at a comparable accuracy. Mask R-CNN, a deep learning object recognition framework, has outperformed all previous state-of-the-art results on competitive benchmarking tasks. Despite this success, Mask R-CNN and other state-of-the-art object recognition techniques have not been widely applied in the underwater domain, and not at all within the context of South Africa. To address this gap in the literature, this thesis contributes (i) a novel image dataset of red roman (Chrysoblephus laticeps), a fish species endemic to Southern Africa, and (ii) a Mask R-CNN framework for the automated localisation, classification, counting, and tracking of red roman in unconstrained underwater environments. The model, trained on an 80:10:10 split, accurately detected and classified red roman on the training dataset (mAP50 = 80.29%), validation dataset (mAP50 = 80.35%), as well as on previously unseen footage (test dataset) (mAP50 = 81.45%). The fact that the model performs equally well on unseen footage suggests that it is capable of generalising to new streams of data not used in this research – this is critical for the utility of any statistical model outside of “laboratory conditions”. This research serves as a proof-of-concept that machine learning based methods of video analysis of marine data can replace or at least supplement human analysis.
Description

Reference:

Collections