• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 344
  • 40
  • 24
  • 14
  • 10
  • 10
  • 9
  • 9
  • 9
  • 9
  • 9
  • 9
  • 8
  • 4
  • 3
  • Tagged with
  • 508
  • 508
  • 508
  • 181
  • 125
  • 103
  • 90
  • 50
  • 49
  • 44
  • 42
  • 42
  • 42
  • 40
  • 39
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
371

MODELING REALITY IN THE VIRTUAL: USABILITY INSIGHTS INTO VOXEL MODELING IN A VR ENVIRONMENT

Yuchao Wang (18431805) 28 April 2024 (has links)
<p dir="ltr">This thesis explores a novel voxel-based 3D modeling tool in virtual reality (VR), assessing its usability with and without Automatic Speech Recognition (ASR). Despite VR's potential for immersive modeling, existing software often lacks functionality or is user-unfriendly. Through participant testing, analysis via the Post-Study System Usability Questionnaire (PSSUQ) and qualitative questions, this study aims to bridge the gap in VR modeling tools, catering to the needs of both laymen and professional modelers.</p>
372

Multi-signal processing for voice recognition in noisy environments

Nayfeh, Taysir H. 22 October 2009 (has links)
One of the main input devices to computerized systems is Voice Recognition Systems (VRS). VRS is best suited for applications where job functions require more than two hands to be performed. The performance of VRS is highly dependent on the environment's noise. The higher the noise level the lower the recognition capability. Automatic lip reading through vision systems have been utilized to improve the recognition capability in noisy environments. However, this approach is costly and time-consuming. The objective of this thesis was to investigate the utilization of an Infrared sensor for automatic lip reading to improve the recognition performance of VRS. The developed system is cheaper and faster than other automatic lip readers. The test results of fifty words and eleven digits indicated that the method has good repeatability, and good character recognition, while not dependent on or sensitive to the ambient light level. Although speaker independence was tested, the results are inconclusive. / Master of Science
373

Spectro-Temporal and Linguistic Processing of Speech in Artificial and Biological Neural Networks

Keshishian, Menoua January 2024 (has links)
Humans possess the fascinating ability to communicate the most complex of ideas through spoken language, without requiring any external tools. This process has two sides—a speaker producing speech, and a listener comprehending it. While the two actions are intertwined in many ways, they entail differential activation of neural circuits in the brains of the speaker and the listener. Both processes are the active subject of artificial intelligence research, under the names of speech synthesis and automatic speech recognition, respectively. While the capabilities of these artificial models are approaching human levels, there are still many unanswered questions about how our brains do this task effortlessly. But the advances in these artificial models allow us the opportunity to study human speech recognition through a computational lens that we did not have before. This dissertation explores the intricate processes of speech perception and comprehension by drawing parallels between artificial and biological neural networks, through the use of computational frameworks that attempt to model either the brain circuits involved in speech recognition, or the process of speech recognition itself. There are two general types of analyses in this dissertation. The first type involves studying neural responses recorded directly through invasive electrophysiology from human participants listening to speech excerpts. The second type involves analyzing artificial neural networks trained to perform the same task of speech recognition, as a potential model for our brains. The first study introduces a novel framework leveraging deep neural networks (DNNs) for interpretable modeling of nonlinear sensory receptive fields, offering an enhanced understanding of auditory neural responses in humans. This approach not only predicts auditory neural responses with increased accuracy but also deciphers distinct nonlinear encoding properties, revealing new insights into the computational principles underlying sensory processing in the auditory cortex. The second study delves into the dynamics of temporal processing of speech in automatic speech recognition networks, elucidating how these systems learn to integrate information across various timescales, mirroring certain aspects of biological temporal processing. The third study presents a rigorous examination of the neural encoding of linguistic information of speech in the auditory cortex during speech comprehension. By analyzing neural responses to natural speech, we identify explicit, distributed neural encoding across multiple levels of linguistic processing, from phonetic features to semantic meaning. This multilevel linguistic analysis contributes to our understanding of the hierarchical and distributed nature of speech processing in the human brain. The final chapter of this dissertation compares linguistic encoding between an automatic speech recognition system and the human brain, elucidating their computational and representational similarities and differences. This comparison underscores the nuanced understanding of how linguistic information is processed and encoded across different systems, offering insights into both biological perception and artificial intelligence mechanisms in speech processing. Through this comprehensive examination, the dissertation advances our understanding of the computational and representational foundations of speech perception, demonstrating the potential of interdisciplinary approaches that bridge neuroscience and artificial intelligence to uncover the underlying mechanisms of speech processing in both artificial and biological systems.
374

Accurate speaker identification employing redundant waveform and model based speech signal representations

Premakanthan, Pravinkumar 01 October 2002 (has links)
No description available.
375

Real-time approach to achieve separation of dissimilar air traffic control phraseologies

Vennerstrand, Daniel 01 October 2001 (has links)
No description available.
376

Implementing a distributed approach for speech resource and system development / Nkadimeng Raymond Molapo

Molapo, Nkadimeng Raymond January 2014 (has links)
The range of applications for high-quality automatic speech recognition (ASR) systems has grown dramatically with the advent of smart phones, in which speech recognition can greatly enhance the user experience. Currently, the languages with extensive ASR support on these devices are languages that have thousands of hours of transcribed speech corpora already collected. Developing a speech system for such a language is made simpler because extensive resources already exist. However for languages that are not as prominent, the process is more difficult. Many obstacles such as reliability and cost have hampered progress in this regard, and various separate tools for every stage of the development process have been developed to overcome these difficulties. Developing a system that is able to combine these identified partial solutions, involves customising existing tools and developing new ones to interface the overall end-to-end process. This work documents the integration of several tools to enable the end-to-end development of an Automatic Speech Recognition system in a typical under-resourced language. Google App Engine is employed as the core environment for data verification, storage and distribution, and used in conjunction with existing tools for gathering text data and for speech data recording. We analyse the data acquired by each of the tools and develop an ASR system in Shona, an important under-resourced language of Southern Africa. Although unexpected logistical problems complicated the process, we were able to collect a useable Shona speech corpus, and develop the first Automatic Speech Recognition system in that language. / MIng (Computer and Electronic Engineering), North-West University, Potchefstroom Campus, 2014
377

Implementing a distributed approach for speech resource and system development / Nkadimeng Raymond Molapo

Molapo, Nkadimeng Raymond January 2014 (has links)
The range of applications for high-quality automatic speech recognition (ASR) systems has grown dramatically with the advent of smart phones, in which speech recognition can greatly enhance the user experience. Currently, the languages with extensive ASR support on these devices are languages that have thousands of hours of transcribed speech corpora already collected. Developing a speech system for such a language is made simpler because extensive resources already exist. However for languages that are not as prominent, the process is more difficult. Many obstacles such as reliability and cost have hampered progress in this regard, and various separate tools for every stage of the development process have been developed to overcome these difficulties. Developing a system that is able to combine these identified partial solutions, involves customising existing tools and developing new ones to interface the overall end-to-end process. This work documents the integration of several tools to enable the end-to-end development of an Automatic Speech Recognition system in a typical under-resourced language. Google App Engine is employed as the core environment for data verification, storage and distribution, and used in conjunction with existing tools for gathering text data and for speech data recording. We analyse the data acquired by each of the tools and develop an ASR system in Shona, an important under-resourced language of Southern Africa. Although unexpected logistical problems complicated the process, we were able to collect a useable Shona speech corpus, and develop the first Automatic Speech Recognition system in that language. / MIng (Computer and Electronic Engineering), North-West University, Potchefstroom Campus, 2014
378

Efficient Decoding of High-order Hidden Markov Models

Engelbrecht, Herman A. 12 1900 (has links)
Thesis (PhD (Electrical and Electronic Engineering))--University of Stellenbosch, 2007. / Most speech recognition and language identification engines are based on hidden Markov models (HMMs). Higher-order HMMs are known to be more powerful than first-order HMMs, but have not been widely used because of their complexity and computational demands. The main objective of this dissertation was to develop a more time-efficient method of decoding high-order HMMs than the standard Viterbi decoding algorithm currently in use. We proposed, implemented and evaluated two decoders based on the Forward-Backward Search (FBS) paradigm, which incorporate information obtained from low-order HMMs. The first decoder is based on time-synchronous Viterbi-beam decoding where we wish to base our state pruning on the complete observation sequence. The second decoder is based on time-asynchronous A* search. The choice of heuristic is critical to the A* search algorithms and a novel, task-independent heuristic function is presented. The experimental results show that both these proposed decoders result in more time-efficient decoding of the fully-connected, high-order HMMs that were investigated. Three significant facts have been uncovered. The first is that conventional forward Viterbi-beam decoding of high-order HMMs is not as computationally expensive as is commonly thought. The second (and somewhat surprising) fact is that backward decoding of conventional, high-order left-context HMMs is significantly more expensive than the conventional forward decoding. By developing the right-context HMM, we showed that the backward decoding of a mathematically equivalent right-context HMM is as expensive as the forward decoding of the left-context HMM. The third fact is that the use of information obtained from low-order HMMs significantly reduces the computational expense of decoding high-order HMMs. The comparison of the two new decoders indicate that the FBS-Viterbi-beam decoder is more time-efficient than the A* decoder. The FBS-Viterbi-beam decoder is not only simpler to implement, it also requires less memory than the A* decoder. We suspect that the broader research community regards the Viterbi-beam algorithm as the most efficient method of decoding HMMs. We hope that the research presented in this dissertation will result in renewed investigation into decoding algorithms that are applicable to high-order HMMs.
379

Fast accurate diphone-based phoneme recognition

Du Preez, Marianne 03 1900 (has links)
Thesis (MScEng (Electrical and Electronic Engineering))--University of Stellenbosch, 2009. / Statistical speech recognition systems typically utilise a set of statistical models of subword units based on the set of phonemes in a target language. However, in continuous speech it is important to consider co-articulation e ects and the interactions between neighbouring sounds, as over-generalisation of the phonetic models can negatively a ect system accuracy. Traditionally co-articulation in continuous speech is handled by incorporating contextual information into the subword model by means of context-dependent models, which exponentially increase the number of subword models. In contrast, transitional models aim to handle co-articulation by modelling the interphone dynamics found in the transitions between phonemes. This research aimed to perform an objective analysis of diphones as subword units for use in hidden Markov model-based continuous-speech recognition systems, with special emphasis on a direct comparison to a context-dependent biphone-based system in terms of complexity, accuracy and computational e ciency in similar parametric conditions. To simulate practical conditions, the experiments were designed to evaluate these systems in a low resource environment { limited supply of training data, computing power and system memory { while still attempting fast, accurate phoneme recognition. Adaptation techniques designed to exploit characteristics inherent in diphones, as well as techniques used for e ective parameter estimation and state-level tying were used to reduce resource requirements while simultaneously increasing parameter reliability. These techniques include diphthong splitting, utilisation of a basic diphone grammar, diphone set completion, maximum a posteriori estimation and decision-tree based state clustering algorithms. The experiments were designed to evaluate the contribution of each adaptation technique individually and subsequently compare the optimised diphone-based recognition system to a biphone-based recognition system that received similar treatment. Results showed that diphone-based recognition systems perform better than both traditional phoneme-based systems and context-dependent biphone-based systems when evaluated in similar parametric conditions. Therefore, diphones are e ective subword units, which carry suprasegmental knowledge of speech signals and provide an excellent compromise between detailed co-articulation modelling and acceptable system performance
380

USB telephony interface device for speech recognition applications

Muller, J. J. 12 1900 (has links)
Thesis (MScEng (Electrical and Electronic Engineering))--University of Stellenbosch, 2005. / Automatic speech recognition (ASR) systems are an attractive means for companies to deliver value added services with which to improve customer satisfaction. Such ASR systems require a telephony interface to connect the speech recognition application to the telephone system. Commercially available telephony interfaces are usually operating system specific, and therefore hardware device driver issues complicate the development of software applications for different platforms that require telephony access. The drivers and application programming interface (API) for telephony interfaces are often available only for the Microsoft Windows operating systems. This poses a problem, as many of the software tools used for speech recognition research and development operate only on Linux-based computers. These interfaces are also typically in PCI/ISA card format, which hinders physical portability of the device to another computer. A simple, cheaper and easier to use USB telephony interface device, offering cross-platform portability, was developed and presented, together with the necessary API.

Page generated in 0.1473 seconds