Using high-level feature concentration for speaker identification

Appana, Brodwyn L; Skosan, Marshalleno; Mashao, Daniel J

Using high-level feature concentration for speaker identification

dc.contributor.author	Appana, Brodwyn L
dc.contributor.author	Skosan, Marshalleno
dc.contributor.author	Mashao, Daniel J
dc.date.accessioned	2017-05-18T09:17:48Z
dc.date.available	2017-05-18T09:17:48Z
dc.date.issued	2004
dc.date.updated	2016-01-07T08:12:05Z
dc.description.abstract	Traditional and current speaker recognition systems primarily use low-level (physiological) features of speech that model the physical dimensions of the vocal tract. The popular MFCC is such a feature vector. There is a growing trend in the literature, however, that evidently supports the idea of improved systems by fusing low-level features with high-level (psychological) features like conversational, lexical, phonemic and prosodic patterns found in speech. In this work we investigated the performance of a speaker ID system evaluated on the NTIMIT database employing the popular MFCC feature vector concatenated with a high-level feature vector containing prosodic information, viz. voicing and pitch. The vector contains the maximum autocorrelation values of a segmented frame of speech and is accordingly named the MACV feature. This paper is an extension of the work done by Wildermoth and Paliwal [11] who reported on an improved speaker ID system that used a fused LPCC-MACV feature set instead of a LPCC-only system. Results presented in this paper showed an improvement from 82.74% to 85.32% for the fused system, a relative improvement of over 3% for the identification rate. This result corroborated with Wildermoth and Paliwal’s [11] performance (an increase from 78.4% to 86.8%) and supports literature on improved recognition systems due to high-level low-level feature fusion. The increase in performance on a popular, state-of-the-art feature vector, like the MFCC, further creates anticipation for promising results to future work on similar systems used on more challenging databases.
dc.identifier.apacitation	2004. <i>Using high-level feature concentration for speaker identification.</i> http://hdl.handle.net/11427/24359	en_ZA
dc.identifier.chicagocitation	. 2004. <i>Using high-level feature concentration for speaker identification.</i> http://hdl.handle.net/11427/24359	en_ZA
dc.identifier.citation	Appanna, B. L., Skosan, M., & Mashao, D. J. (2004, November). Using high-level and low-level feature concatenation for speaker identification. In Fifteenth Annual Symposium of the Pattern Recognition Association of South Africa (p. 103).
dc.identifier.ris	TY - AU - Appana, Brodwyn L AU - Skosan, Marshalleno AU - Mashao, Daniel J AB - Traditional and current speaker recognition systems primarily use low-level (physiological) features of speech that model the physical dimensions of the vocal tract. The popular MFCC is such a feature vector. There is a growing trend in the literature, however, that evidently supports the idea of improved systems by fusing low-level features with high-level (psychological) features like conversational, lexical, phonemic and prosodic patterns found in speech. In this work we investigated the performance of a speaker ID system evaluated on the NTIMIT database employing the popular MFCC feature vector concatenated with a high-level feature vector containing prosodic information, viz. voicing and pitch. The vector contains the maximum autocorrelation values of a segmented frame of speech and is accordingly named the MACV feature. This paper is an extension of the work done by Wildermoth and Paliwal [11] who reported on an improved speaker ID system that used a fused LPCC-MACV feature set instead of a LPCC-only system. Results presented in this paper showed an improvement from 82.74% to 85.32% for the fused system, a relative improvement of over 3% for the identification rate. This result corroborated with Wildermoth and Paliwal’s [11] performance (an increase from 78.4% to 86.8%) and supports literature on improved recognition systems due to high-level low-level feature fusion. The increase in performance on a popular, state-of-the-art feature vector, like the MFCC, further creates anticipation for promising results to future work on similar systems used on more challenging databases. DA - 2004 DB - OpenUCT DP - University of Cape Town LK - https://open.uct.ac.za PB - University of Cape Town PY - 2004 T1 - Using high-level feature concentration for speaker identification TI - Using high-level feature concentration for speaker identification UR - http://hdl.handle.net/11427/24359 ER -	en_ZA
dc.identifier.uri	http://hdl.handle.net/11427/24359
dc.identifier.vancouvercitation	. 2004. <i>Using high-level feature concentration for speaker identification.</i> http://hdl.handle.net/11427/24359	en_ZA
dc.language.iso	eng
dc.publisher.department	Department of Electrical Engineering	en_ZA
dc.publisher.faculty	Faculty of Engineering and the Built Environment
dc.publisher.institution	University of Cape Town
dc.title	Using high-level feature concentration for speaker identification
dc.type	Other
uct.type.filetype	Text
uct.type.filetype	Image

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Appana_Using_high_level_feature_2004.pdf
Size:: 5.79 MB
Format:: Adobe Portable Document Format
Description:

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.72 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Other / General