371 |
MODELING REALITY IN THE VIRTUAL: USABILITY INSIGHTS INTO VOXEL MODELING IN A VR ENVIRONMENTYuchao Wang (18431805) 28 April 2024 (has links)
<p dir="ltr">This thesis explores a novel voxel-based 3D modeling tool in virtual reality (VR), assessing its usability with and without Automatic Speech Recognition (ASR). Despite VR's potential for immersive modeling, existing software often lacks functionality or is user-unfriendly. Through participant testing, analysis via the Post-Study System Usability Questionnaire (PSSUQ) and qualitative questions, this study aims to bridge the gap in VR modeling tools, catering to the needs of both laymen and professional modelers.</p>
|
372 |
Multi-signal processing for voice recognition in noisy environmentsNayfeh, Taysir H. 22 October 2009 (has links)
One of the main input devices to computerized systems is Voice Recognition Systems (VRS). VRS is best suited for applications where job functions require more than two hands to be performed. The performance of VRS is highly dependent on the environment's noise. The higher the noise level the lower the recognition capability. Automatic lip reading through vision systems have been utilized to improve the recognition capability in noisy environments. However, this approach is costly and time-consuming.
The objective of this thesis was to investigate the utilization of an Infrared sensor for automatic lip reading to improve the recognition performance of VRS. The developed system is cheaper and faster than other automatic lip readers. The test results of fifty words and eleven digits indicated that the method has good repeatability, and good character recognition, while not dependent on or sensitive to the ambient light level. Although speaker independence was tested, the results are inconclusive. / Master of Science
|
373 |
Spectro-Temporal and Linguistic Processing of Speech in Artificial and Biological Neural NetworksKeshishian, Menoua January 2024 (has links)
Humans possess the fascinating ability to communicate the most complex of ideas through spoken language, without requiring any external tools. This process has two sides—a speaker producing speech, and a listener comprehending it. While the two actions are intertwined in many ways, they entail differential activation of neural circuits in the brains of the speaker and the listener. Both processes are the active subject of artificial intelligence research, under the names of speech synthesis and automatic speech recognition, respectively. While the capabilities of these artificial models are approaching human levels, there are still many unanswered questions about how our brains do this task effortlessly. But the advances in these artificial models allow us the opportunity to study human speech recognition through a computational lens that we did not have before. This dissertation explores the intricate processes of speech perception and comprehension by drawing parallels between artificial and biological neural networks, through the use of computational frameworks that attempt to model either the brain circuits involved in speech recognition, or the process of speech recognition itself.
There are two general types of analyses in this dissertation. The first type involves studying neural responses recorded directly through invasive electrophysiology from human participants listening to speech excerpts. The second type involves analyzing artificial neural networks trained to perform the same task of speech recognition, as a potential model for our brains. The first study introduces a novel framework leveraging deep neural networks (DNNs) for interpretable modeling of nonlinear sensory receptive fields, offering an enhanced understanding of auditory neural responses in humans. This approach not only predicts auditory neural responses with increased accuracy but also deciphers distinct nonlinear encoding properties, revealing new insights into the computational principles underlying sensory processing in the auditory cortex. The second study delves into the dynamics of temporal processing of speech in automatic speech recognition networks, elucidating how these systems learn to integrate information across various timescales, mirroring certain aspects of biological temporal processing.
The third study presents a rigorous examination of the neural encoding of linguistic information of speech in the auditory cortex during speech comprehension. By analyzing neural responses to natural speech, we identify explicit, distributed neural encoding across multiple levels of linguistic processing, from phonetic features to semantic meaning. This multilevel linguistic analysis contributes to our understanding of the hierarchical and distributed nature of speech processing in the human brain. The final chapter of this dissertation compares linguistic encoding between an automatic speech recognition system and the human brain, elucidating their computational and representational similarities and differences. This comparison underscores the nuanced understanding of how linguistic information is processed and encoded across different systems, offering insights into both biological perception and artificial intelligence mechanisms in speech processing.
Through this comprehensive examination, the dissertation advances our understanding of the computational and representational foundations of speech perception, demonstrating the potential of interdisciplinary approaches that bridge neuroscience and artificial intelligence to uncover the underlying mechanisms of speech processing in both artificial and biological systems.
|
374 |
Accurate speaker identification employing redundant waveform and model based speech signal representationsPremakanthan, Pravinkumar 01 October 2002 (has links)
No description available.
|
375 |
Real-time approach to achieve separation of dissimilar air traffic control phraseologiesVennerstrand, Daniel 01 October 2001 (has links)
No description available.
|
376 |
Implementing a distributed approach for speech resource and system development / Nkadimeng Raymond MolapoMolapo, Nkadimeng Raymond January 2014 (has links)
The range of applications for high-quality automatic speech recognition (ASR) systems has grown
dramatically with the advent of smart phones, in which speech recognition can greatly enhance the
user experience. Currently, the languages with extensive ASR support on these devices are languages
that have thousands of hours of transcribed speech corpora already collected. Developing a speech
system for such a language is made simpler because extensive resources already exist. However for
languages that are not as prominent, the process is more difficult. Many obstacles such as reliability
and cost have hampered progress in this regard, and various separate tools for every stage of the
development process have been developed to overcome these difficulties.
Developing a system that is able to combine these identified partial solutions, involves customising
existing tools and developing new ones to interface the overall end-to-end process. This work documents
the integration of several tools to enable the end-to-end development of an Automatic Speech
Recognition system in a typical under-resourced language. Google App Engine is employed as the
core environment for data verification, storage and distribution, and used in conjunction with existing
tools for gathering text data and for speech data recording. We analyse the data acquired by each of
the tools and develop an ASR system in Shona, an important under-resourced language of Southern
Africa. Although unexpected logistical problems complicated the process, we were able to collect
a useable Shona speech corpus, and develop the first Automatic Speech Recognition system in that
language. / MIng (Computer and Electronic Engineering), North-West University, Potchefstroom Campus, 2014
|
377 |
Implementing a distributed approach for speech resource and system development / Nkadimeng Raymond MolapoMolapo, Nkadimeng Raymond January 2014 (has links)
The range of applications for high-quality automatic speech recognition (ASR) systems has grown
dramatically with the advent of smart phones, in which speech recognition can greatly enhance the
user experience. Currently, the languages with extensive ASR support on these devices are languages
that have thousands of hours of transcribed speech corpora already collected. Developing a speech
system for such a language is made simpler because extensive resources already exist. However for
languages that are not as prominent, the process is more difficult. Many obstacles such as reliability
and cost have hampered progress in this regard, and various separate tools for every stage of the
development process have been developed to overcome these difficulties.
Developing a system that is able to combine these identified partial solutions, involves customising
existing tools and developing new ones to interface the overall end-to-end process. This work documents
the integration of several tools to enable the end-to-end development of an Automatic Speech
Recognition system in a typical under-resourced language. Google App Engine is employed as the
core environment for data verification, storage and distribution, and used in conjunction with existing
tools for gathering text data and for speech data recording. We analyse the data acquired by each of
the tools and develop an ASR system in Shona, an important under-resourced language of Southern
Africa. Although unexpected logistical problems complicated the process, we were able to collect
a useable Shona speech corpus, and develop the first Automatic Speech Recognition system in that
language. / MIng (Computer and Electronic Engineering), North-West University, Potchefstroom Campus, 2014
|
378 |
Efficient Decoding of High-order Hidden Markov ModelsEngelbrecht, Herman A. 12 1900 (has links)
Thesis (PhD (Electrical and Electronic Engineering))--University of Stellenbosch, 2007. / Most speech recognition and language identification engines are based on hidden Markov
models (HMMs). Higher-order HMMs are known to be more powerful than first-order
HMMs, but have not been widely used because of their complexity and computational
demands. The main objective of this dissertation was to develop a more time-efficient
method of decoding high-order HMMs than the standard Viterbi decoding algorithm
currently in use.
We proposed, implemented and evaluated two decoders based on the Forward-Backward
Search (FBS) paradigm, which incorporate information obtained from low-order HMMs.
The first decoder is based on time-synchronous Viterbi-beam decoding where we wish
to base our state pruning on the complete observation sequence. The second decoder is
based on time-asynchronous A* search. The choice of heuristic is critical to the A* search
algorithms and a novel, task-independent heuristic function is presented. The experimental
results show that both these proposed decoders result in more time-efficient decoding
of the fully-connected, high-order HMMs that were investigated.
Three significant facts have been uncovered. The first is that conventional forward
Viterbi-beam decoding of high-order HMMs is not as computationally expensive as is
commonly thought.
The second (and somewhat surprising) fact is that backward decoding of conventional,
high-order left-context HMMs is significantly more expensive than the conventional forward
decoding. By developing the right-context HMM, we showed that the backward
decoding of a mathematically equivalent right-context HMM is as expensive as the forward
decoding of the left-context HMM.
The third fact is that the use of information obtained from low-order HMMs significantly
reduces the computational expense of decoding high-order HMMs. The comparison
of the two new decoders indicate that the FBS-Viterbi-beam decoder is more time-efficient
than the A* decoder. The FBS-Viterbi-beam decoder is not only simpler to implement,
it also requires less memory than the A* decoder.
We suspect that the broader research community regards the Viterbi-beam algorithm
as the most efficient method of decoding HMMs. We hope that the research presented
in this dissertation will result in renewed investigation into decoding algorithms that are
applicable to high-order HMMs.
|
379 |
Fast accurate diphone-based phoneme recognitionDu Preez, Marianne 03 1900 (has links)
Thesis (MScEng (Electrical and Electronic Engineering))--University of Stellenbosch, 2009. / Statistical speech recognition systems typically utilise a set of statistical models of subword
units based on the set of phonemes in a target language. However, in continuous
speech it is important to consider co-articulation e ects and the interactions between
neighbouring sounds, as over-generalisation of the phonetic models can negatively a ect
system accuracy. Traditionally co-articulation in continuous speech is handled by incorporating
contextual information into the subword model by means of context-dependent
models, which exponentially increase the number of subword models. In contrast, transitional
models aim to handle co-articulation by modelling the interphone dynamics found
in the transitions between phonemes.
This research aimed to perform an objective analysis of diphones as subword units for
use in hidden Markov model-based continuous-speech recognition systems, with special
emphasis on a direct comparison to a context-dependent biphone-based system in terms
of complexity, accuracy and computational e ciency in similar parametric conditions. To
simulate practical conditions, the experiments were designed to evaluate these systems
in a low resource environment { limited supply of training data, computing power and
system memory { while still attempting fast, accurate phoneme recognition.
Adaptation techniques designed to exploit characteristics inherent in diphones, as
well as techniques used for e ective parameter estimation and state-level tying were used
to reduce resource requirements while simultaneously increasing parameter reliability.
These techniques include diphthong splitting, utilisation of a basic diphone grammar,
diphone set completion, maximum a posteriori estimation and decision-tree based state
clustering algorithms. The experiments were designed to evaluate the contribution of each
adaptation technique individually and subsequently compare the optimised diphone-based
recognition system to a biphone-based recognition system that received similar treatment.
Results showed that diphone-based recognition systems perform better than both traditional
phoneme-based systems and context-dependent biphone-based systems when evaluated
in similar parametric conditions. Therefore, diphones are e ective subword units,
which carry suprasegmental knowledge of speech signals and provide an excellent compromise
between detailed co-articulation modelling and acceptable system performance
|
380 |
USB telephony interface device for speech recognition applicationsMuller, J. J. 12 1900 (has links)
Thesis (MScEng (Electrical and Electronic Engineering))--University of Stellenbosch, 2005. / Automatic speech recognition (ASR) systems are an attractive means for companies to deliver value added
services with which to improve customer satisfaction. Such ASR systems require a telephony interface to
connect the speech recognition application to the telephone system. Commercially available telephony
interfaces are usually operating system specific, and therefore hardware device driver issues complicate the
development of software applications for different platforms that require telephony access. The drivers and
application programming interface (API) for telephony interfaces are often available only for the Microsoft
Windows operating systems. This poses a problem, as many of the software tools used for speech recognition
research and development operate only on Linux-based computers. These interfaces are also typically in
PCI/ISA card format, which hinders physical portability of the device to another computer. A simple, cheaper
and easier to use USB telephony interface device, offering cross-platform portability, was developed and
presented, together with the necessary API.
|
Page generated in 0.1473 seconds