dc.contributor.author |
Appana, Brodwyn L
|
|
dc.contributor.author |
Skosan, Marshalleno
|
|
dc.contributor.author |
Mashao, Daniel J
|
|
dc.date.accessioned |
2017-05-18T09:17:48Z |
|
dc.date.available |
2017-05-18T09:17:48Z |
|
dc.date.issued |
2004 |
|
dc.identifier.citation |
Appanna, B. L., Skosan, M., & Mashao, D. J. (2004, November). Using high-level and low-level feature concatenation for speaker identification. In Fifteenth Annual Symposium of the Pattern Recognition Association of South Africa (p. 103). |
|
dc.identifier.uri |
http://hdl.handle.net/11427/24359
|
|
dc.description.abstract |
Traditional and current speaker recognition systems primarily use low-level (physiological) features of speech that model the physical dimensions of the vocal tract. The popular MFCC is such a feature vector. There is a growing trend in the literature, however, that evidently supports the idea of improved systems by fusing low-level features with high-level (psychological) features like conversational, lexical, phonemic and prosodic patterns found in speech. In this work we investigated the performance of a speaker ID system evaluated on the NTIMIT database employing the popular MFCC feature vector concatenated with a high-level feature vector containing prosodic information, viz. voicing and pitch. The vector contains the maximum autocorrelation values of a segmented frame of speech and is accordingly named the MACV feature. This paper is an extension of the work done by Wildermoth and Paliwal [11] who reported on an improved speaker ID system that used a fused LPCC-MACV feature set instead of a LPCC-only system. Results presented in this paper showed an improvement from 82.74% to 85.32% for the fused system, a relative improvement of over 3% for the identification rate. This result corroborated with Wildermoth and Paliwal’s [11] performance (an increase from 78.4% to 86.8%) and supports literature on improved recognition systems due to high-level low-level feature fusion. The increase in performance on a popular, state-of-the-art feature vector, like the MFCC, further creates anticipation for promising results to future work on similar systems used on more challenging databases. |
|
dc.language.iso |
eng |
|
dc.title |
Using high-level feature concentration for speaker identification |
|
dc.type |
Other |
|
dc.date.updated |
2016-01-07T08:12:05Z |
|
dc.publisher.institution |
University of Cape Town |
|
dc.publisher.faculty |
Faculty of Engineering and the Built Environment |
|
dc.publisher.department |
Department of Electrical Engineering |
en_ZA |
uct.type.filetype |
Text |
|
uct.type.filetype |
Image |
|
dc.identifier.apacitation |
2004. <i>Using high-level feature concentration for speaker identification.</i> http://hdl.handle.net/11427/24359 |
en_ZA |
dc.identifier.chicagocitation |
. 2004. <i>Using high-level feature concentration for speaker identification.</i> http://hdl.handle.net/11427/24359 |
en_ZA |
dc.identifier.vancouvercitation |
. 2004. <i>Using high-level feature concentration for speaker identification.</i> http://hdl.handle.net/11427/24359 |
en_ZA |
dc.identifier.ris |
TY -
AU - Appana, Brodwyn L
AU - Skosan, Marshalleno
AU - Mashao, Daniel J
AB - Traditional and current speaker recognition systems primarily use low-level (physiological) features of speech that model the physical dimensions of the vocal tract. The popular MFCC is such a feature vector. There is a growing trend in the literature, however, that evidently supports the idea of improved systems by fusing low-level features with high-level (psychological) features like conversational, lexical, phonemic and prosodic patterns found in speech. In this work we investigated the performance of a speaker ID system evaluated on the NTIMIT database employing the popular MFCC feature vector concatenated with a high-level feature vector containing prosodic information, viz. voicing and pitch. The vector contains the maximum autocorrelation values of a segmented frame of speech and is accordingly named the MACV feature. This paper is an extension of the work done by Wildermoth and Paliwal [11] who reported on an improved speaker ID system that used a fused LPCC-MACV feature set instead of a LPCC-only system. Results presented in this paper showed an improvement from 82.74% to 85.32% for the fused system, a relative improvement of over 3% for the identification rate. This result corroborated with Wildermoth and Paliwal’s [11] performance (an increase from 78.4% to 86.8%) and supports literature on improved recognition systems due to high-level low-level feature fusion. The increase in performance on a popular, state-of-the-art feature vector, like the MFCC, further creates anticipation for promising results to future work on similar systems used on more challenging databases.
DA - 2004
DB - OpenUCT
DP - University of Cape Town
LK - https://open.uct.ac.za
PB - University of Cape Town
PY - 2004
T1 - Using high-level feature concentration for speaker identification
TI - Using high-level feature concentration for speaker identification
UR - http://hdl.handle.net/11427/24359
ER -
|
en_ZA |