Global ETD Search

231	The effects of recognition accuracy and vocabulary size of a speech recognition system on task performance and user acceptance Casali, Sherry P. 22 June 2010 (has links) Automatic speech recognition systems have at last advanced to the state that they are now a feasible alternative for human-machine communication in selected applications. As such, research efforts are now beginning to focus on characteristics of the human, the recognition device, and the interface which optimize the system performance, rather than the previous trend of determining factors affecting recognizer performance alone. This study investigated two characteristics of the recognition device, the accuracy level at which it recognizes speech, and the vocabulary size of the recognizer as a percent of task vocabulary size to determine their effects on system performance. In addition, the study considered one characteristic of the user, age. Briefly, subjects performed a data entry task under each of the treatment conditions. Task completion time and the number of errors remaining at the end of each session were recorded. After each session, subjects rated the recognition device used as to its acceptability for the task. The accuracy level at which the recognizer was performing significantly influenced the task completion time as well as the user's acceptability ratings, but had only a small effect on the number of errors left uncorrected. The available vocabulary size also significantly affected the task completion time; however, its effect on the final error rate and on the acceptability ratings was negligible. The age of the subject was also found to influence both objective and subjective measures. Older subjects in general required longer times to complete the tasks; however, they consistently rated the speech input systems more favorably than the younger subjects. / Master of Science LD5655.V855 1988.C382 Automatic speech recognition Speech processing systems
232	Improving the quality of speech in noisy environments Parikh, Devangi Nikunj 06 November 2012 (has links) In this thesis, we are interested in processing noisy speech signals that are meant to be heard by humans, and hence we approach the noise-suppression problem from a perceptual perspective. We develop a noise-suppression paradigm that is based on a model of the human auditory system, where we process signals in a way that is natural to the human ear. Under this paradigm, we transform an audio signal in to a perceptual domain, and processes the signal in this perceptual domain. This approach allows us to reduce the background noise and the audible artifacts that are seen in traditional noise-suppression algorithms, while preserving the quality of the processed speech. We develop a single- and dual-microphone algorithm based on this perceptual paradigm, and conduct subjecting tests to show that this approach outperforms traditional noise-suppression techniques. Moreover, we investigate the cause of audible artifacts that are generated as a result of suppressing the noise in noisy signals, and introduce constraints on the noise-suppression gain such that these artifacts are reduced. Blind source separation Perceptual processing Noise suppression Speech processing Ambient sounds Speech processing systems Speech Noise control Acoustical engineering
233	An Analog Architecture for Auditory Feature Extraction and Recognition Smith, Paul Devon 22 November 2004 (has links) Speech recognition systems have been implemented using a wide range of signal processing techniques including neuromorphic/biological inspired and Digital Signal Processing techniques. Neuromorphic/biologically inspired techniques, such as silicon cochlea models, are based on fairly simple yet highly parallel computation and/or computational units. While the area of digital signal processing (DSP) is based on block transforms and statistical or error minimization methods. Essential to each of these techniques is the first stage of extracting meaningful information from the speech signal, which is known as feature extraction. This can be done using biologically inspired techniques such as silicon cochlea models, or techniques beginning with a model of speech production and then trying to separate the the vocal tract response from an excitation signal. Even within each of these approaches, there are multiple techniques including cepstrum filtering, which sits under the class of Homomorphic signal processing, or techniques using FFT based predictive approaches. The underlying reality is there are multiple techniques that have attacked the problem in speech recognition but the problem is still far from being solved. The techniques that have shown to have the best recognition rates involve Cepstrum Coefficients for the feature extraction and Hidden-Markov Models to perform the pattern recognition. The presented research develops an analog system based on programmable analog array technology that can perform the initial stages of auditory feature extraction and recognition before passing information to a digital signal processor. The goal being a low power system that can be fully contained on one or more integrated circuit chips. Results show that it is possible to realize advanced filtering techniques such as Cepstrum Filtering and Vector Quantization in analog circuitry. Prior to this work, previous applications of analog signal processing have focused on vision, cochlea models, anti-aliasing filters and other single component uses. Furthermore, classic designs have looked heavily at utilizing op-amps as a basic core building block for these designs. This research also shows a novel design for a Hidden Markov Model (HMM) decoder utilizing circuits that take advantage of the inherent properties of subthreshold transistors and floating-gate technology to create low-power computational blocks. Neuromorphic cochlea Hidden Markov Model (HMM) Cepstrum Speech processing systems Cochlea Models Speech processing systems Markov processes Signal processing Digital techniques Neural networks (Computer science)
234	Evaluation of two tactile speech displays Clements, Mark Andrew. January 1978 (has links) Thesis: Elec. E., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 1978 / Bibliography: leaves 57-59. / by Mark Andrew Clements. / Elec. E. / Elec. E. Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science Deaf Means of communication. Speech processing systems. Vibrators. Touch. Touch. fast Vibrators. fast Speech processing systems. fast Deaf fast
235	Independent formant and pitch control applied to singing voice Calitz, Wietsche Roets 12 1900 (has links) Thesis (MScIng)--University of Stellenbosch, 2004. / ENGLISH ABSTRACT: A singing voice can be manipulated artificially by means of a digital computer for the purposes of creating new melodies or to correct existing ones. When the fundamental frequency of an audio signal that represents a human voice is changed by simple algorithms, the formants of the voice tend to move to new frequency locations, making it sound unnatural. The main purpose is to design a technique by which the pitch and formants of a singing voice can be controlled independently. / AFRIKAANSE OPSOMMING: Onafhanklike formant- en toonhoogte beheer toegepas op ’n sangstem: ’n Sangstem kan deur ’n digitale rekenaar gemanipuleer word om nuwe melodie¨e te skep, of om bestaandes te verbeter. Wanneer die fundamentele frekwensie van ’n klanksein (wat ’n menslike stem voorstel) deur ’n eenvoudige algoritme verander word, skuif die oorspronklike formante na nuwe frekwensie gebiede. Dit veroorsaak dat die resultaat onnatuurlik klink. Die hoof oogmerk is om ’n tegniek te ontwerp wat die toonhoogte en die formante van ’n sangstem apart kan beheer. Speech processing systems Vocoder Signal processing Singing -- Data processing Theses -- Electrical engineering Dissertations -- Electrical engineering
236	Non-acoustic speaker recognition Du Toit, Ilze 12 1900 (has links) Thesis (MScIng)--University of Stellenbosch, 2004. / ENGLISH ABSTRACT: In this study the phoneme labels derived from a phoneme recogniser are used for phonetic speaker recognition. The time-dependencies among phonemes are modelled by using hidden Markov models (HMMs) for the speaker models. Experiments are done using firstorder and second-order HMMs and various smoothing techniques are examined to address the problem of data scarcity. The use of word labels for lexical speaker recognition is also investigated. Single word frequencies are counted and the use of various word selections as feature sets are investigated. During April 2004, the University of Stellenbosch, in collaboration with Spescom DataVoice, participated in an international speaker verification competition presented by the National Institute of Standards and Technology (NIST). The University of Stellenbosch submitted phonetic and lexical (non-acoustic) speaker recognition systems and a fused system (the primary system) that fuses the acoustic system of Spescom DataVoice with the non-acoustic systems of the University of Stellenbosch. The results were evaluated by means of a cost model. Based on the cost model, the primary system obtained second and third position in the two categories that were submitted. / AFRIKAANSE OPSOMMING: Hierdie projek maak gebruik van foneem-etikette wat geklassifiseer word deur ’n foneemherkenner en daarna gebruik word vir fonetiese sprekerherkenning. Die tyd-afhanklikhede tussen foneme word gemodelleer deur gebruik te maak van verskuilde Markov modelle (HMMs) as sprekermodelle. Daar word ge¨eksperimenteer met eerste-orde en tweede-orde HMMs en verskeie vergladdingstegnieke word ondersoek om dataskaarsheid aan te spreek. Die gebruik van woord-etikette vir sprekerherkenning word ook ondersoek. Enkelwoordfrekwensies word getel en daar word ge¨eksperimenteer met verskeie woordseleksies as kenmerke vir sprekerherkenning. Gedurende April 2004 het die Universiteit van Stellenbosch in samewerking met Spescom DataVoice deelgeneem aan ’n internasionale sprekerverifikasie kompetisie wat deur die National Institute of Standards and Technology (NIST) aangebied is. Die Universiteit van Stellenbosch het ingeskryf vir ’n fonetiese en ’n woordgebaseerde (nie-akoestiese) sprekerherkenningstelsel, asook ’n saamgesmelte stelsel wat as primˆere stelsel dien. Die saamgesmelte stelsel is ’n kombinasie van Spescom DataVoice se akoestiese stelsel en die twee nie-akoestiese stelsels van die Universiteit van Stellenbosch. Die resultate is ge¨evalueer deur gebruik te maak van ’n koste-model. Op grond van die koste-model het die primˆere stelsel tweede en derde plek behaal in die twee kategorie¨e waaraan deelgeneem is. Automatic speech recognition Speech processing systems Speech perception Theses -- Electronic engineering Dissertations -- Electronic engineering
237	Wavelet-based speech enhancement : a statistical approach Harmse, Wynand 12 1900 (has links) Thesis (MScIng)--University of Stellenbosch, 2004. / ENGLISH ABSTRACT: Speech enhancement is the process of removing background noise from speech signals. The equivalent process for images is known as image denoising. While the Fourier transform is widely used for speech enhancement, image denoising typically uses the wavelet transform. Research on wavelet-based speech enhancement has only recently emerged, yet it shows promising results compared to Fourier-based methods. This research is enhanced by the availability of new wavelet denoising algorithms based on the statistical modelling of wavelet coefficients, such as the hidden Markov tree. The aim of this research project is to investigate wavelet-based speech enhancement from a statistical perspective. Current Fourier-based speech enhancement and its evaluation process are described, and a framework is created for wavelet-based speech enhancement. Several wavelet denoising algorithms are investigated, and it is found that the algorithms based on the statistical properties of speech in the wavelet domain outperform the classical and more heuristic denoising techniques. The choice of wavelet influences the quality of the enhanced speech and the effect of this choice is therefore examined. The introduction of a noise floor parameter also improves the perceptual quality of the wavelet-based enhanced speech, by masking annoying residual artifacts. The performance of wavelet-based speech enhancement is similar to that of the more widely used Fourier methods at low noise levels, with a slight difference in the residual artifact. At high noise levels, however, the Fourier methods are superior. / AFRIKAANSE OPSOMMING: Spraaksuiwering is die proses waardeur agtergrondgeraas uit spraakseine verwyder word. Die ekwivalente proses vir beelde word beeldsuiwering genoem. Terwyl spraaksuiwering in die algemeen in die Fourier-domein gedoen word, gebruik beeldsuiwering tipies die golfietransform. Navorsing oor golfie-gebaseerde spraaksuiwering het eers onlangs verskyn, en dit toon reeds belowende resultate in vergelyking met Fourier-gebaseerde metodes. Hierdie navorsingsveld word aangehelp deur die beskikbaarheid van nuwe golfie-gebaseerde suiweringstegnieke wat die golfie-ko¨effisi¨ente statisties modelleer, soos die verskuilde Markovboom. Die doel van hierdie navorsingsprojek is om golfie-gebaseerde spraaksuiwering vanuit ‘n statistiese oogpunt te bestudeer. Huidige Fourier-gebaseerde spraaksuiweringsmetodes asook die evalueringsproses vir sulke algoritmes word bespreek, en ‘n raamwerk word geskep vir golfie-gebaseerde spraaksuiwering. Verskeie golfie-gebaseerde algoritmes word ondersoek, en daar word gevind dat die metodes wat die statistiese eienskappe van spraak in die golfie-gebied gebruik, beter vaar as die klassieke en meer heuristiese metodes. Die keuse van golfie be¨ınvloed die kwaliteit van die gesuiwerde spraak, en die effek van hierdie keuse word dus ondersoek. Die gebruik van ‘n ruisvloer parameter verhoog ook die kwaliteit van die golfie-gesuiwerde spraak, deur steurende residuele artifakte te verberg. Die golfie-metodes vaar omtrent dieselfde as die klassieke Fourier-metodes by lae ruisvlakke, met ’n klein verskil in residuele artifakte. By ho¨e ruisvlakke vaar die Fouriermetodes egter steeds beter. Speech synthesis Speech processing systems Theses -- Electronic engineering Dissertations -- Electronic engineering
238	Tree-based Gaussian mixture models for speaker verification Cilliers, Francois Dirk 12 1900 (has links) Thesis (MScEng (Electrical and Electronic Engineering))--University of Stellenbosch, 2005. / The Gaussian mixture model (GMM) performs very effectively in applications such as speech and speaker recognition. However, evaluation speed is greatly reduced when the GMM has a large number of mixture components. Various techniques improve the evaluation speed by reducing the number of required Gaussian evaluations. Dissertations -- Electronic engineering Theses -- Electronic engineering Automatic speech recognition Speech processing systems Electrical and Electronic Engineering
239	Speech generation in a spoken dialogue system Visagie, Albertus Sybrand 12 1900 (has links) Thesis (MScIng)--University of Stellenbosch, 2004. / ENGLISH ABSTRACT: Spoken dialogue systems accessed over the telephone network are rapidly becoming more popular as a means to reduce call-centre costs and improve customer experience. It is now technologically feasible to delegate repetitive and relatively simple tasks conducted in most telephone calls to automatic systems. Such a system uses speech recognition to take input from users. This work focuses on the speech generation component that a specific prototype system uses to convey audible speech output back to the user. Many commercial systems contain general text-to-speech synthesisers. Text-to-speech synthesis is a very active branch of speech processing. It aims to build machines that read text aloud. In some languages this has been a reality for almost two decades. While these synthesisers are often very understandable, they almost never sound natural. The output quality of synthetic speech is considered to be a very important factor in the user’s perception of the quality and usability of spoken dialogue systems. The static nature of the spoken dialogue system is exploited to produce a custom speech synthesis component that provides very high quality output speech for the particular application. To this end the current state of the art in speech synthesis is surveyed and summarised. A unit-selection synthesiser is produced that functions in Afrikaans, English and Xhosa. The unit-selection synthesiser selects short waveforms from a recorded speech corpus, and concatenates them to produce the required utterances. Techniques are developed for designing a compact corpus and processing it to produce a unit-selection database. Speech modification methods were researched to build a framework for natural-sounding speech concatenation. This framework also provides pitch and duration modification capabilities that will enable research in languages such as Afrikaans and Xhosa where text-to-speech capabilities are relatively immature. / AFRIKAANSE OPSOMMING: Telefoniese, spraakgebaseerde dialoogstelsels word steeds meer algemeen, en is ’n doeltreffende metode om oproepsentrumkostes te verlaag. Dit is tans tegnologies moontlik om ’n groot aantal eenvoudige transaksies met automatiese stelsels te hanteer. Sulke stelsels gebruik spraakherkenning om intree van die gebruiker te ontvang. Hierdie werk fokus op die spraakgenerasiekomponent wat ’n spesifieke prototipestelsel gebruik om afvoer aan die gebruiker terug te speel. Vele kommersi¨ele stelsels gebruik generiese teks-na-spraak sintetiseerders. Sulke teksna- spraak sintetiseerders is steeds ’n baie aktiewe veld in spraaknavorsing. In die algemeen poog navorsing om teks te kan lees en om te sit in verstaanbare spraak. Sulke stelsels bestaan nou al vir ten minste twee dekades. Alhoewel heeltemal verstaanbaar, klink hierdie stelsels onnatuurlik. In telefoniese spraakgebaseerde dialoogstelsels is kwaliteit van die sintetiese spraak belangrik vir die gebruiker se persepsie van die stelsel se kwaliteit en bruikbaarheid. Die dialoog is meestal staties van aard en hierdie eienskap word benut om ho¨e kwaliteit spraak in ’n bepaalde toepassing te sintetiseer. Om dit reg te kry is die huidige stand van sake in hierdie veld bestudeer en opgesom. ’n Knip-en-plak sintetiseerder is gebou wat werk in Afrikaans, Engels en Xhosa. Die sintetiseerder selekteer kort stukkies spraakgolfvorms vanuit ’n spraakkorpus, en las dit aanmekaar om die vereiste spraak te produseer. Outomatiese tegnieke is ontwikkel om ’n kompakte korpus te ontwerp wat steeds alles bevat wat die sintetiseerder sal nodig hˆe om sy taak te verrig. Verdere tegnieke prosesseer die korpus tot ’n bruikbare vorm vir sintese. Metodes van spraakmodifikasie is ondersoek ten einde die aanmekaargelaste stukkies spraak meer natuurlik te laat klink en die intonasie en tempo daarvan te korrigeer. Dit verskaf infrastruktuur vir navorsing in tale soos Afrikaans en Xhosa waar teks-na-spraak vermo¨ens nog onvolwasse is. Speech processing systems Speech synthesis Theses -- Electronic engineering Dissertations -- Electronic engineering
240	The design of a high-performance, floating-point embedded system for speech recognition and audio research purposes Duckitt, William 03 1900 (has links) Thesis (MScEng (Electrical and Electronic Engineering))--Stellenbosch University, 2008. / This thesis describes the design of a high performance, floating-point, standalone embedded system that is appropriate for speech and audio processing purposes. The system successfully employs the Analog Devices TigerSHARC TS201 600MHz floating point digital signal processor as a CPU, and includes 512MB RAM, a Compact Flash storage card interface as non-volatile memory, a multi-channel audio input and output system with two programmable microphone preamplifiers offering up to 65dB gain, a USB interface, a LCD display and a push-button user interface. An Altera Cyclone II FPGA is used to interface the CPU with the various peripheral components. The FIFO buffers within the FPGA allow bulk DMA transfers of audio data for minimal processor delays. Similar approaches are taken for communication with the USB interface, the Compact Flash storage card and the LCD display. A logic analyzer interface allows system debugging via the FPGA. This interface can also in future be used to interface to additional components. The power distribution required a total of 11 different supplies to be provided with a total consumption of 16.8W. A 6 layer PCB incorporating 4 signal layers, a power plane and ground plane was designed for the final prototype. All system components were verified to be operating correctly by means of appropriate testing software, and the computational performance was measured by repeated calculation of a multi-dimensional Gaussian log-probability and found to be comparable with an Intel 1.8GHz Core2Duo processor. The design can therefore be considered a success, and the prototype is ready for development of suitable speech or audio processing software. Automatic speech recognition Speech processing systems Dissertations -- Electronic engineering Theses -- Electronic engineering Electrical and Electronic Engineering

Search results