• English
  • Čeština
  • Deutsch
  • Español
  • Français
  • Gàidhlig
  • Latviešu
  • Magyar
  • Nederlands
  • Português
  • Português do Brasil
  • Suomi
  • Svenska
  • Türkçe
  • Қазақ
  • বাংলা
  • हिंदी
  • Ελληνικά
  • Log In
  • Communities & Collections
  • Browse OpenUCT
  • English
  • Čeština
  • Deutsch
  • Español
  • Français
  • Gàidhlig
  • Latviešu
  • Magyar
  • Nederlands
  • Português
  • Português do Brasil
  • Suomi
  • Svenska
  • Türkçe
  • Қазақ
  • বাংলা
  • हिंदी
  • Ελληνικά
  • Log In
  1. Home
  2. Browse by Author

Browsing by Author "Mashao, Daniel"

Now showing 1 - 15 of 15
Results Per Page
Sort Options
  • Loading...
    Thumbnail Image
    Item
    Open Access
    An artificial Intelligence Approach to improving Speech Recognition
    (2009) Lopes, Luis Ramos dos Santos; Mashao, Daniel; Ventura, Neco
    Speech Recognition is a technology with promising applications. However, the performance of current speech recognizers greatly limit their widespread use. Approaches to reducing the word error rate have mainly been associated with statistical techniques. As a consequence, speech recognition results can still contain sentences that are nonsensical. The method proposed here, is to analize the output of any chosen speech recognition system, in order to determine whether a sentence contains syntactic or semantic errors. This is done via a software agent that uses the information from its knowledge base to attempt to correct the errors found. A system was implemented with a small vocabulary speaker-independent continuous speech recognition system, with limited sentence structures. The achieved increase in speech recognition accuracy, shows that there are bene ts in using this approach.
  • Loading...
    Thumbnail Image
    Item
    Open Access
    A comparison of features for large population speaker identification
    (2000) Baloyi, Norman Tinyiko; Mashao, Daniel
    Speech recognition systems all have one criterion in common; they perform better in a controlled environment using clean speech. Though performance can be excellent, even exceeding human capabilities for clean speech, systems fail when presented with speech data from more realistic environments such as telephone channels. The differences using a recognizer in clean and noisy environments are extreme, and this causes one of the major obstacles in producing commercial recognition systems to be used in normal environments. It is the lack of performance of speaker recognition systems with telephone channels that this work addresses. The human auditory system is a speech recognizer with excellent performance, especially in noisy environments. Since humans perform well at ignoring noise more than any machine, auditory-based methods are the promising approaches since they attempt to model the working of the human auditory system. These methods have been shown to outperform more conventional signal processing schemes for speech recognition, speech coding, word-recognition and phone classification tasks. Since speaker identification has received lot of attention in speech processing because of its waiting real-world applications, it is attractive to evaluate the performance using auditory models as features. Firstly, this study rums at improving the results for speaker identification. The improvements were made through the use of parameterized feature-sets together with the application of cepstral mean removal for channel equalization. The study is further extended to compare an auditory-based model, the Ensemble Interval Histogram, with mel-scale features, which was shown to perform almost error-free in clean speech. The previous studies of Elli to be more robust to noise were conducted on speaker dependent, small population, isolated words and now are extended to speaker independent, larger population, continuous speech. This study investigates whether the Elli representation is more resistant to telephone noise than mel-cepstrum as was shown in the previous studies, when now for the first time, it is applied for speaker identification task using the state-of-the-art Gaussian mixture model system.
  • Loading...
    Thumbnail Image
    Item
    Open Access
    A comparison of the network speech recognition and distributed speech recognition systems and their effect on speech enabling mobile devices
    (2010) Isaacs, Dale; Mashao, Daniel
    Over the past 10 years there has been an exponential increase in the number of mobile subscribers worldwide. Market research has shown that the number of mobile subscribers rose to 4.3 billion towards end of Q1 in 2009. The unprecedented development of the telecommunication industry over the last decade has brought about the need for ubiquitous access to a host of different information resources and services. Today, speech remains the best medium of communication between people and it is conceivable that speech enabling mobile devices will allow users who only have mobile devices, to access all the information which is now available over the world wide web.
  • Loading...
    Thumbnail Image
    Item
    Open Access
    Design of an advanced and fluent Sesotho text-to-speech system through intonation
    (2006) Mohasi, Lehlohonolo; Mashao, Daniel
    Includes bibliographical references (leaves 97-101).
  • Loading...
    Thumbnail Image
    Item
    Open Access
    Design of an advanced Text-To-Speech system for Afrikaans
    (2006) Rousseau, Francois; Mashao, Daniel
    Afrikaans is the home language to approximately six million people in South Africa. The need for an Afrikaans TTS system comes with the growing interest in integrating speech technology in all eleven languages of the country. The ultimate goal here is to enable communication between man and machine using speech. This can be achieved with the use of speech technology by implementing multilingual technological systems that all the people in South Africa can understand and relate to. Understandability, flexibility, naturalness and pleasantnedd are the requirements of an advanced TTS system. The technique of concatentative speech synthesis has been the most successful in meeting all these requirements. The Festival speech synthesis system uses two popular concatenative techniques to design new TTS systems in different languages. The techniques are: diphone concatenative synthesis (DCS) and unit selection synthesis (USS).
  • Loading...
    Thumbnail Image
    Item
    Open Access
    Dynamic bandwidth allocation in ATM networks
    (2002) Ashibani, Majdi Ali Atoomi; Mashao, Daniel; Nleya, Bakhe
    This thesis investigates bandwidth allocation methodologies to transport new emerging bursty traffic types in ATM networks. However, existing ATM traffic management solutions are not readily able to handle the inevitable problem of congestion as result of the bursty traffic from the new emerging services. This research basically addresses bandwidth allocation issues for bursty traffic by proposing and exploring the concept of dynamic bandwidth allocation and comparing it to the traditional static bandwidth allocation schemes.
  • Loading...
    Thumbnail Image
    Item
    Open Access
    Evaluating microphone arrays for a speaker identification task
    (2004) Zulu, Nicholas; Mashao, Daniel
    Abstract—Microphone array systems have been an area of active research for several years. The potential for high quality hands-free speech acquisition in noisy and reflecting environments makes microphone arrays an attractive alternative to conventional close-talking microphones. The signal-enhancement and sourcelocation capabilities of microphone arrays make them applicable to a variety of tasks including teleconferencing, speaker tracking, speaker recognition and speech recognition. In this paper we evaluate techniques for setting up microphone arrays for speaker identification. We propose the use of an active noise canceling beamformer based on the generalized sidelobe canceller (GSC) beamformer. Significant improvements in identification rate are achieved using this method compared to other beamforming techniques investigated in this paper.
  • Loading...
    Thumbnail Image
    Item
    Open Access
    Evaluation of speaker adaptation algorithms for public access IVR server
    (2006) Noah, O P; Mashao, Daniel
    IVR servers are telephony-based applications that allow a user to interact with a series of menus in order to retrieve information. For many years users have commonly used key pads to interact with IVR servers. As a result users have experiences numerous benefits in obtaining information from these servers. However, speech recognition is fast becoming a popular way of interacting with IVR servers. ... This study is concerned with evaluating the suitability of speaker adaptation methods for improving the performance of the modern speech recognition using minimal time
  • Loading...
    Thumbnail Image
    Item
    Open Access
    Histogram equalization for robust text-independent speaker verification in telephone environments
    (2005) Skosan, Marshalleno; Mashao, Daniel
    Word processed copy. Includes bibliographical references.
  • Loading...
    Thumbnail Image
    Item
    Open Access
    Implementation and evaluation of a low complexity microphone array for speaker recognition
    (2005) Zulu, Peleira Nicholas; Mashao, Daniel
    This thesis discusses the application of a microphone array employing a noise canceling beamforming technique for improving the robustness of speaker recognition systems in a diffuse noise field.
  • Loading...
    Thumbnail Image
    Item
    Open Access
    Increased diphone recognition for an Afrikaans TTS system
    (2004) Rousseau, Francois; Mashao, Daniel
    In this paper we discuss the implementation of an Afrikaans TTS system that is based on diphones. Using diphones makes the system flexible but presents other challenges. A previous effort to design an Afrikaans TTS system was done by SUN. They implemented a TTS system based on full words. A full word based TTS system produces more natural sounding speech than when the system is designed using other techniques. The disadvantage of using full words is that it lacks flexibility. The baseline system was build using the Festival Speech Synthesis System. Problems occurred in the baseline due to the mislabeling of diphones and the diphone index. The system was improved by manually labeling the diphones using Wavesurfer, and by changing the diphone index. Wavelength comparison tests were done on the diphone index to show how much of the diphones are recognized during synthesis. For the diphones tested results show an average improvement of 38% in the recognition of diphones compared to the baseline. These improvements improve the overall quality of the system.
  • Loading...
    Thumbnail Image
    Item
    Open Access
    A non-linear polynomial approximation filter for robust speaker verification
    (2003) Mothae, Limpho; Mashao, Daniel
  • Loading...
    Thumbnail Image
    Item
    Open Access
    Operation, administration and maintenance in all optical networks
    (2001) Baker, Carol; Mashao, Daniel
    The phenomenal growth rate of Internet traffic as well as the increasing demand for high bandwidth services such as video-on-demand (VoD), high definition TV (HDTV), video-conferencing etc., have created a new networking environment in which flexibility, scalability and high bandwidth capacity are of utmost importance. All Optical Networks has emerged as a promising technology to deliver the high capacity demands of current and envisaged future applications.
  • Loading...
    Thumbnail Image
    Item
    Open Access
    Supporting real time video over ATM networks
    (2003) Shija, Benedict; Mashao, Daniel; Nleya, Bakhe
    In this project, we propose and evaluate an approach to delimit and tag such independent video slice at the ATM layer for early discard. This involves the use of a tag cell differentiated from the rest of the data by its PTI value and a modified tag switch to facilitate the selective discarding of affected cells within each video slice as opposed to dropping of cells at random from multiple video frames.
  • Loading...
    Thumbnail Image
    Item
    Open Access
    Usability engineering of interactive voice responsive (IVR) systems in oral users of Southern Africa
    (2011) Ndwe, Tembalethu Jama; Barnard, Etienne; Dlodlo, Mqhele E; Mashao, Daniel
    This research study focuses on the feasibility of using the telephone as a tool for information access in the oral communities of Southern Africa. The OpenPhone and BGR systems are used as case studies and their designs have been influenced by field studies with the targeted users. The OpenPhone project aims to design an Interactive Voice Response (IVR) health information system that enables people who are caregivers for HIV/AIDS infected children to access relevant care-giving information by using a telephone in their native language of Setswana in Botswana, Southern Africa. The BGR system allows soccer fans to access results of recently played matches in Premier Soccer League (PSL) of South Africa.
UCT Libraries logo

Contact us

Jill Claassen

Manager: Scholarly Communication & Publishing

Email: openuct@uct.ac.za

+27 (0)21 650 1263

  • Open Access @ UCT

    • OpenUCT LibGuide
    • Open Access Policy
    • Open Scholarship at UCT
    • OpenUCT FAQs
  • UCT Publishing Platforms

    • UCT Open Access Journals
    • UCT Open Access Monographs
    • UCT Press Open Access Books
    • Zivahub - Open Data UCT
  • Site Usage

    • Cookie settings
    • Privacy policy
    • End User Agreement
    • Send Feedback

DSpace software copyright © 2002-2026 LYRASIS