Detection and Isolation of Prey Capture Events in Animal-Borne Images

Master Thesis


Permanent link to this Item
Journal Title
Link to Journal
Journal ISSN
Volume Title
Understanding the foraging habits and prey availability for a species is crucial. Prey availability is crucial to a species' survival and sustainability of the food pyramid. Identifying the type of prey consumed also allows ecologists to determine the energy received, while the duration and extent of foraging bouts provide information about the energy expended. With recent advancements in technology, data collection has become more accessible, and animal-borne video cameras are an increasingly popular mechanism for collecting information about foraging and other behaviour. Video recorders collect large volumes of data but create a bottleneck as data processing is still predominantly done manually. This process is time-consuming and costly, even with the assistance of crowdsourced tasks. Advancements in deep learning, and its applications to computer vision, provide opportunities to apply these tools to ecological problems, such as the processing of data from animal-borne video recorders. Speeding up the annotation process allows more time to be spent focused on the ecological research questions. This dissertation aims to develop detection and isolation models that will assist in the processing of visual data, namely images from animal-borne videos. The first model used for detection will perform an image classification determining whether prey is present or not. Images found to have prey present will then be presented to the second model for isolation that identifies exactly where within the image the prey is and labels the type of prey. The models were trained on video data of little penguins (Eudyptula minor ), whose main prey in this investigation are small fish, predominantly anchovies, and jellyfish. The image classification model based on the ResNet architecture achieved 85% accuracy with precision and recall values of 0.85 and 0.85 respectively on its test set. The object detection model based on the You Only Look Once (YOLO) framework achieved a mean average precision of 60% on its test set. However, the models did not perform well enough on unseen full length videos to be used without human supervision or to serve as alternatives to manual labelling. Rather, the models can be used to guide researchers to areas that may contain prey events.