Investigating audio classification to automate the trimming of recorded lectures

Govender, Devandran

Investigating audio classification to automate the trimming of recorded lectures

dc.contributor.advisor	Suleman, Hussein
dc.contributor.author	Govender, Devandran
dc.date.accessioned	2019-02-22T11:53:48Z
dc.date.available	2019-02-22T11:53:48Z
dc.date.issued	2018
dc.date.updated	2019-02-19T07:12:58Z
dc.description.abstract	With the demand for recorded lectures to be made available as soon as possible, the University of Cape Town (UCT) needs to find innovative ways of removing bottlenecks in lecture capture workflow and thereby improving turn-around times from capture to publication. UCT utilises Opencast, which is an open source system to manage all the steps in the lecture-capture process. One of the steps involves manual trimming of unwanted segments from the beginning and end of video before it is published. These segments generally contain student chatter. The trimming step of the lecture-capture process has been identified as a bottleneck due to its dependence on staff availability. In this study, we investigate the potential of audio classification to automate this step. A classification model was trained to detect 2 classes: speech and non-speech. Speech represents a single dominant voice, for example, the lecturer, and non-speech represents student chatter, silence and other environmental sounds. In conjunction with the classification model, the first and last instances of the speech class together with their timestamps are detected. These timestamps are used to predict the start and end trim points for the recorded lecture. The classification model achieved a 97.8% accuracy rate at detecting speech from non-speech. The start trim point predictions were very positive, with an average difference of -11.22s from gold standard data. End trim point predictions showed a much greater deviation, with an average difference of 145.16s from gold standard data. Discussions between the lecturer and students, after the lecture, was predominantly the reason for this discrepancy.
dc.identifier.apacitation	Govender, D. (2018). <i>Investigating audio classification to automate the trimming of recorded lectures</i>. (). University of Cape Town ,Faculty of Science ,Department of Computer Science. Retrieved from http://hdl.handle.net/11427/29778	en_ZA
dc.identifier.chicagocitation	Govender, Devandran. <i>"Investigating audio classification to automate the trimming of recorded lectures."</i> ., University of Cape Town ,Faculty of Science ,Department of Computer Science, 2018. http://hdl.handle.net/11427/29778	en_ZA
dc.identifier.citation	Govender, D. 2018. Investigating audio classification to automate the trimming of recorded lectures. University of Cape Town.	en_ZA
dc.identifier.ris	TY - Thesis / Dissertation AU - Govender, Devandran AB - With the demand for recorded lectures to be made available as soon as possible, the University of Cape Town (UCT) needs to find innovative ways of removing bottlenecks in lecture capture workflow and thereby improving turn-around times from capture to publication. UCT utilises Opencast, which is an open source system to manage all the steps in the lecture-capture process. One of the steps involves manual trimming of unwanted segments from the beginning and end of video before it is published. These segments generally contain student chatter. The trimming step of the lecture-capture process has been identified as a bottleneck due to its dependence on staff availability. In this study, we investigate the potential of audio classification to automate this step. A classification model was trained to detect 2 classes: speech and non-speech. Speech represents a single dominant voice, for example, the lecturer, and non-speech represents student chatter, silence and other environmental sounds. In conjunction with the classification model, the first and last instances of the speech class together with their timestamps are detected. These timestamps are used to predict the start and end trim points for the recorded lecture. The classification model achieved a 97.8% accuracy rate at detecting speech from non-speech. The start trim point predictions were very positive, with an average difference of -11.22s from gold standard data. End trim point predictions showed a much greater deviation, with an average difference of 145.16s from gold standard data. Discussions between the lecturer and students, after the lecture, was predominantly the reason for this discrepancy. DA - 2018 DB - OpenUCT DP - University of Cape Town LK - https://open.uct.ac.za PB - University of Cape Town PY - 2018 T1 - Investigating audio classification to automate the trimming of recorded lectures TI - Investigating audio classification to automate the trimming of recorded lectures UR - http://hdl.handle.net/11427/29778 ER -	en_ZA
dc.identifier.uri	http://hdl.handle.net/11427/29778
dc.identifier.vancouvercitation	Govender D. Investigating audio classification to automate the trimming of recorded lectures. []. University of Cape Town ,Faculty of Science ,Department of Computer Science, 2018 [cited yyyy month dd]. Available from: http://hdl.handle.net/11427/29778	en_ZA
dc.language.iso	eng
dc.publisher.department	Department of Computer Science
dc.publisher.faculty	Faculty of Science
dc.publisher.institution	University of Cape Town
dc.subject.other	Information Technology
dc.title	Investigating audio classification to automate the trimming of recorded lectures
dc.type	Master Thesis
dc.type.qualificationlevel	Masters
dc.type.qualificationname	MSc

Files

Original bundle

Now showing 1 - 1 of 1

Name:: thesis_sci_2018_govender_devandran.pdf
Size:: 1.73 MB
Format:: Adobe Portable Document Format
Description:

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 0 B
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Masters