Using high-level feature concentration for speaker identification
| dc.contributor.author | Appana, Brodwyn L | |
| dc.contributor.author | Skosan, Marshalleno | |
| dc.contributor.author | Mashao, Daniel J | |
| dc.date.accessioned | 2017-05-18T09:17:48Z | |
| dc.date.available | 2017-05-18T09:17:48Z | |
| dc.date.issued | 2004 | |
| dc.date.updated | 2016-01-07T08:12:05Z | |
| dc.description.abstract | Traditional and current speaker recognition systems primarily use low-level (physiological) features of speech that model the physical dimensions of the vocal tract. The popular MFCC is such a feature vector. There is a growing trend in the literature, however, that evidently supports the idea of improved systems by fusing low-level features with high-level (psychological) features like conversational, lexical, phonemic and prosodic patterns found in speech. In this work we investigated the performance of a speaker ID system evaluated on the NTIMIT database employing the popular MFCC feature vector concatenated with a high-level feature vector containing prosodic information, viz. voicing and pitch. The vector contains the maximum autocorrelation values of a segmented frame of speech and is accordingly named the MACV feature. This paper is an extension of the work done by Wildermoth and Paliwal [11] who reported on an improved speaker ID system that used a fused LPCC-MACV feature set instead of a LPCC-only system. Results presented in this paper showed an improvement from 82.74% to 85.32% for the fused system, a relative improvement of over 3% for the identification rate. This result corroborated with Wildermoth and Paliwal’s [11] performance (an increase from 78.4% to 86.8%) and supports literature on improved recognition systems due to high-level low-level feature fusion. The increase in performance on a popular, state-of-the-art feature vector, like the MFCC, further creates anticipation for promising results to future work on similar systems used on more challenging databases. | |
| dc.identifier.apacitation | 2004. <i>Using high-level feature concentration for speaker identification.</i> http://hdl.handle.net/11427/24359 | en_ZA |
| dc.identifier.chicagocitation | . 2004. <i>Using high-level feature concentration for speaker identification.</i> http://hdl.handle.net/11427/24359 | en_ZA |
| dc.identifier.citation | Appanna, B. L., Skosan, M., & Mashao, D. J. (2004, November). Using high-level and low-level feature concatenation for speaker identification. In Fifteenth Annual Symposium of the Pattern Recognition Association of South Africa (p. 103). | |
| dc.identifier.ris | TY - AU - Appana, Brodwyn L AU - Skosan, Marshalleno AU - Mashao, Daniel J AB - Traditional and current speaker recognition systems primarily use low-level (physiological) features of speech that model the physical dimensions of the vocal tract. The popular MFCC is such a feature vector. There is a growing trend in the literature, however, that evidently supports the idea of improved systems by fusing low-level features with high-level (psychological) features like conversational, lexical, phonemic and prosodic patterns found in speech. In this work we investigated the performance of a speaker ID system evaluated on the NTIMIT database employing the popular MFCC feature vector concatenated with a high-level feature vector containing prosodic information, viz. voicing and pitch. The vector contains the maximum autocorrelation values of a segmented frame of speech and is accordingly named the MACV feature. This paper is an extension of the work done by Wildermoth and Paliwal [11] who reported on an improved speaker ID system that used a fused LPCC-MACV feature set instead of a LPCC-only system. Results presented in this paper showed an improvement from 82.74% to 85.32% for the fused system, a relative improvement of over 3% for the identification rate. This result corroborated with Wildermoth and Paliwal’s [11] performance (an increase from 78.4% to 86.8%) and supports literature on improved recognition systems due to high-level low-level feature fusion. The increase in performance on a popular, state-of-the-art feature vector, like the MFCC, further creates anticipation for promising results to future work on similar systems used on more challenging databases. DA - 2004 DB - OpenUCT DP - University of Cape Town LK - https://open.uct.ac.za PB - University of Cape Town PY - 2004 T1 - Using high-level feature concentration for speaker identification TI - Using high-level feature concentration for speaker identification UR - http://hdl.handle.net/11427/24359 ER - | en_ZA |
| dc.identifier.uri | http://hdl.handle.net/11427/24359 | |
| dc.identifier.vancouvercitation | . 2004. <i>Using high-level feature concentration for speaker identification.</i> http://hdl.handle.net/11427/24359 | en_ZA |
| dc.language.iso | eng | |
| dc.publisher.department | Department of Electrical Engineering | en_ZA |
| dc.publisher.faculty | Faculty of Engineering and the Built Environment | |
| dc.publisher.institution | University of Cape Town | |
| dc.title | Using high-level feature concentration for speaker identification | |
| dc.type | Other | |
| uct.type.filetype | Text | |
| uct.type.filetype | Image |