Real-time video sentiment analysis through the use of snapshots

Ramma, Sudiptee

Real-time video sentiment analysis through the use of snapshots

Thesis / Dissertation

2025

Publisher

University of Cape Town

Department

Department of Statistical Sciences

Faculty

Faculty of Science

Abstract

There are many types of emotions that one can experience and they usually have a direct impact on a person's behaviour. Emotions can be conveyed in several ways such as gestures/body movement, words or facial expressions and this dissertation we aim to distinguish the emotional state of a person based on their facial expressions. Several approaches have been devised in this regard by various past researchers within the computer vision field but unfortunately, despite the similarities in the adopted techniques for the facial and emotion detection processes, there still exist some discrepancies regarding their performances when applied to different images or video streams. As such, the goal of this dissertation is to develop a program that can analyse a real-time video stream and take in each of the frames as an image snapshot which can be in turn processed to efficiently identify faces and recognise a person's emotion based on their facial expressions. Two scenarios, namely Frontal only, and Profile and Frontal, each with their datasets were accounted for in this research. The first dataset (Frontal) consists only of users who are facing forward and the second one (Profile and Frontal) consists of users who are facing forward as well as sideways. Convolutional Neural Network (CNN) models were constructed for each of the given datasets on both the augmented and non-augmented versions of these datasets to obtain the best possible model for each scenario before applying such model to a real-time video stream. In both scenarios, the augmented models outperformed the non-augmented models when tested on unseen static image data and when such a model was applied to a real-time video stream with the help of the OpenCV library and the relevant Haar Cascade classifiers, required for the face detection process (depending on which scenario), fairly accurate results were obtained when each frame within the video stream were converted into an image snapshot before classification. The code for this dissertation can be found here: https://github.com/Drish19/Facial-Emotion-Recognition.