Using high-level feature concentration for speaker identification

 

Show simple item record

dc.contributor.author Appana, Brodwyn L
dc.contributor.author Skosan, Marshalleno
dc.contributor.author Mashao, Daniel J
dc.date.accessioned 2017-05-18T09:17:48Z
dc.date.available 2017-05-18T09:17:48Z
dc.date.issued 2004
dc.identifier.citation Appanna, B. L., Skosan, M., & Mashao, D. J. (2004, November). Using high-level and low-level feature concatenation for speaker identification. In Fifteenth Annual Symposium of the Pattern Recognition Association of South Africa (p. 103).
dc.identifier.uri http://hdl.handle.net/11427/24359
dc.description.abstract Traditional and current speaker recognition systems primarily use low-level (physiological) features of speech that model the physical dimensions of the vocal tract. The popular MFCC is such a feature vector. There is a growing trend in the literature, however, that evidently supports the idea of improved systems by fusing low-level features with high-level (psychological) features like conversational, lexical, phonemic and prosodic patterns found in speech. In this work we investigated the performance of a speaker ID system evaluated on the NTIMIT database employing the popular MFCC feature vector concatenated with a high-level feature vector containing prosodic information, viz. voicing and pitch. The vector contains the maximum autocorrelation values of a segmented frame of speech and is accordingly named the MACV feature. This paper is an extension of the work done by Wildermoth and Paliwal [11] who reported on an improved speaker ID system that used a fused LPCC-MACV feature set instead of a LPCC-only system. Results presented in this paper showed an improvement from 82.74% to 85.32% for the fused system, a relative improvement of over 3% for the identification rate. This result corroborated with Wildermoth and Paliwal’s [11] performance (an increase from 78.4% to 86.8%) and supports literature on improved recognition systems due to high-level low-level feature fusion. The increase in performance on a popular, state-of-the-art feature vector, like the MFCC, further creates anticipation for promising results to future work on similar systems used on more challenging databases.
dc.language.iso eng
dc.title Using high-level feature concentration for speaker identification
dc.type Other
dc.date.updated 2016-01-07T08:12:05Z
dc.publisher.institution University of Cape Town
dc.publisher.faculty Faculty of Engineering and the Built Environment
dc.publisher.department Department of Electrical Engineering en_ZA
uct.type.filetype Text
uct.type.filetype Image
dc.identifier.apacitation 2004. <i>Using high-level feature concentration for speaker identification.</i> http://hdl.handle.net/11427/24359 en_ZA
dc.identifier.chicagocitation . 2004. <i>Using high-level feature concentration for speaker identification.</i> http://hdl.handle.net/11427/24359 en_ZA
dc.identifier.vancouvercitation . 2004. <i>Using high-level feature concentration for speaker identification.</i> http://hdl.handle.net/11427/24359 en_ZA
dc.identifier.ris TY - AU - Appana, Brodwyn L AU - Skosan, Marshalleno AU - Mashao, Daniel J AB - Traditional and current speaker recognition systems primarily use low-level (physiological) features of speech that model the physical dimensions of the vocal tract. The popular MFCC is such a feature vector. There is a growing trend in the literature, however, that evidently supports the idea of improved systems by fusing low-level features with high-level (psychological) features like conversational, lexical, phonemic and prosodic patterns found in speech. In this work we investigated the performance of a speaker ID system evaluated on the NTIMIT database employing the popular MFCC feature vector concatenated with a high-level feature vector containing prosodic information, viz. voicing and pitch. The vector contains the maximum autocorrelation values of a segmented frame of speech and is accordingly named the MACV feature. This paper is an extension of the work done by Wildermoth and Paliwal [11] who reported on an improved speaker ID system that used a fused LPCC-MACV feature set instead of a LPCC-only system. Results presented in this paper showed an improvement from 82.74% to 85.32% for the fused system, a relative improvement of over 3% for the identification rate. This result corroborated with Wildermoth and Paliwal’s [11] performance (an increase from 78.4% to 86.8%) and supports literature on improved recognition systems due to high-level low-level feature fusion. The increase in performance on a popular, state-of-the-art feature vector, like the MFCC, further creates anticipation for promising results to future work on similar systems used on more challenging databases. DA - 2004 DB - OpenUCT DP - University of Cape Town LK - https://open.uct.ac.za PB - University of Cape Town PY - 2004 T1 - Using high-level feature concentration for speaker identification TI - Using high-level feature concentration for speaker identification UR - http://hdl.handle.net/11427/24359 ER - en_ZA


Files in this item

This item appears in the following Collection(s)

Show simple item record