Human action recognition with 3D convolutional neural networks

Master Thesis

2015

Permanent link to this Item
Authors
Supervisors
Journal Title
Link to Journal
Journal ISSN
Volume Title
Publisher
Publisher

University of Cape Town

License
Series
Abstract
Convolutional neural networks (CNNs) adapt the regular fully-connected neural network (NN) algorithm to facilitate image classification. Recently, CNNs have been demonstrated to provide superior performance across numerous image classification databases including large natural images (Krizhevsky et al., 2012). Furthermore, CNNs are more readily transferable between different image classification problems when compared to common alternatives. The extension of CNNs to video classification is simple and the rationale behind the components of the model are still applicable due to the similarity between image and video data. Previous CNNs have demonstrated good performance upon video datasets, however have not employed methods that have been recently developed and attributed improvements in image classification networks. The purpose of this research to build a CNN model that includes recently developed elements to present a human action recognition model which is up-to-date with current trends in CNNs and current hardware. Focus is applied to ensemble models and methods such as the Dropout technique, developed by Hinton et al. (2012) to reduce overfitting, and learning rate adaptation techniques. The KTH human action dataset is used to assess the CNN model, which, as a widely used benchmark dataset, facilitates the comparison between previous work performed in the literature. Three CNNs are built and trained to provide insight into design choices as well as allow the construction of an ensemble model. The final ensemble model achieved comparative performance to previous CNNs trained upon the KTH data. While the inclusion of new methods to the CNN model did not result in an improvement on previous models, the competitive result provides an alternative combination of architecture and components to other CNN models.
Description

Reference:

Collections