Development of a test suite for single object tracking algorithms in video

Master Thesis

2021

Permanent link to this Item
Authors
Supervisors
Journal Title
Link to Journal
Journal ISSN
Volume Title
Publisher
Publisher
License
Series
Abstract
Flying Camera Solutions (FlyCam), within Sony Lund's startup accelerator, intends to provide drone videography to paying customers in ski resorts: a customer should be able to go about their activity as usual while a drone films them. Visual object tracking, enabling the drone to track the customer throughout the activity, is a primary obstacle in creating a viable autonomous videography service. FlyCam needs an object tracking algorithm which is accurate, robust, real-time, and requiring minimal computational overhead. We propose two innovations to aid in the selection of an appropriate tracking algorithm. Firstly, a video annotation algorithm, making use of an object detector to record the position and type of object in each frame of a video clip. Secondly, an algorithm designed to evaluate the performance of any given object tracker based on a set of performance metrics. These metrics include, among others, measures of positional accuracy, frame rate, and false positive rate. For the video annotation algorithm we implemented the state-of-the-art Mask R-CNN object detector, which achieved an average frame rate of 1.5 fps annotating video clips in up to 4K resolution. Another algorithm then played back the annotated clips to the user such that incorrect object detections could be rooted out or rectified. With little relevant annotated video available, the annotation algorithm proved useful in preparing a suite of 18 clips to be evaluated. Ten performance metrics were adapted from multi-object to single-object tracking. Nine tracking algorithms were then run on each of the 18 test video clips at varying resolutions to produce 375 tracking observations for analysis. The evaluation results revealed the optimal tracking algorithm to be Re3: a recurrent-convolutional neural network tracker which runs at respectable speeds on a consumer laptop. This is a promising result; with enough annotated data, neural networks can be retrained to improve performance. Within just a few months of operation, FlyCam could amass enough specific video data to significantly improve the neural network-based tracker.
Description

Reference:

Collections