Using high-level feature concentration for speaker identification

dc.contributor.authorAppana, Brodwyn L
dc.contributor.authorSkosan, Marshalleno
dc.contributor.authorMashao, Daniel J
dc.date.accessioned2017-05-18T09:17:48Z
dc.date.available2017-05-18T09:17:48Z
dc.date.issued2004
dc.date.updated2016-01-07T08:12:05Z
dc.description.abstractTraditional and current speaker recognition systems primarily use low-level (physiological) features of speech that model the physical dimensions of the vocal tract. The popular MFCC is such a feature vector. There is a growing trend in the literature, however, that evidently supports the idea of improved systems by fusing low-level features with high-level (psychological) features like conversational, lexical, phonemic and prosodic patterns found in speech. In this work we investigated the performance of a speaker ID system evaluated on the NTIMIT database employing the popular MFCC feature vector concatenated with a high-level feature vector containing prosodic information, viz. voicing and pitch. The vector contains the maximum autocorrelation values of a segmented frame of speech and is accordingly named the MACV feature. This paper is an extension of the work done by Wildermoth and Paliwal [11] who reported on an improved speaker ID system that used a fused LPCC-MACV feature set instead of a LPCC-only system. Results presented in this paper showed an improvement from 82.74% to 85.32% for the fused system, a relative improvement of over 3% for the identification rate. This result corroborated with Wildermoth and Paliwal’s [11] performance (an increase from 78.4% to 86.8%) and supports literature on improved recognition systems due to high-level low-level feature fusion. The increase in performance on a popular, state-of-the-art feature vector, like the MFCC, further creates anticipation for promising results to future work on similar systems used on more challenging databases.
dc.identifier.apacitation 2004. <i>Using high-level feature concentration for speaker identification.</i> http://hdl.handle.net/11427/24359en_ZA
dc.identifier.chicagocitation. 2004. <i>Using high-level feature concentration for speaker identification.</i> http://hdl.handle.net/11427/24359en_ZA
dc.identifier.citationAppanna, B. L., Skosan, M., & Mashao, D. J. (2004, November). Using high-level and low-level feature concatenation for speaker identification. In Fifteenth Annual Symposium of the Pattern Recognition Association of South Africa (p. 103).
dc.identifier.ris TY - AU - Appana, Brodwyn L AU - Skosan, Marshalleno AU - Mashao, Daniel J AB - Traditional and current speaker recognition systems primarily use low-level (physiological) features of speech that model the physical dimensions of the vocal tract. The popular MFCC is such a feature vector. There is a growing trend in the literature, however, that evidently supports the idea of improved systems by fusing low-level features with high-level (psychological) features like conversational, lexical, phonemic and prosodic patterns found in speech. In this work we investigated the performance of a speaker ID system evaluated on the NTIMIT database employing the popular MFCC feature vector concatenated with a high-level feature vector containing prosodic information, viz. voicing and pitch. The vector contains the maximum autocorrelation values of a segmented frame of speech and is accordingly named the MACV feature. This paper is an extension of the work done by Wildermoth and Paliwal [11] who reported on an improved speaker ID system that used a fused LPCC-MACV feature set instead of a LPCC-only system. Results presented in this paper showed an improvement from 82.74% to 85.32% for the fused system, a relative improvement of over 3% for the identification rate. This result corroborated with Wildermoth and Paliwal’s [11] performance (an increase from 78.4% to 86.8%) and supports literature on improved recognition systems due to high-level low-level feature fusion. The increase in performance on a popular, state-of-the-art feature vector, like the MFCC, further creates anticipation for promising results to future work on similar systems used on more challenging databases. DA - 2004 DB - OpenUCT DP - University of Cape Town LK - https://open.uct.ac.za PB - University of Cape Town PY - 2004 T1 - Using high-level feature concentration for speaker identification TI - Using high-level feature concentration for speaker identification UR - http://hdl.handle.net/11427/24359 ER - en_ZA
dc.identifier.urihttp://hdl.handle.net/11427/24359
dc.identifier.vancouvercitation. 2004. <i>Using high-level feature concentration for speaker identification.</i> http://hdl.handle.net/11427/24359en_ZA
dc.language.isoeng
dc.publisher.departmentDepartment of Electrical Engineeringen_ZA
dc.publisher.facultyFaculty of Engineering and the Built Environment
dc.publisher.institutionUniversity of Cape Town
dc.titleUsing high-level feature concentration for speaker identification
dc.typeOther
uct.type.filetypeText
uct.type.filetypeImage
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Appana_Using_high_level_feature_2004.pdf
Size:
5.79 MB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.72 KB
Format:
Item-specific license agreed upon to submission
Description:
Collections