Global ETD Search

131	Evaluation of two tactile speech displays Clements, Mark Andrew. January 1978 (has links) Thesis: Elec. E., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 1978 / Bibliography: leaves 57-59. / by Mark Andrew Clements. / Elec. E. / Elec. E. Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science Deaf Means of communication. Speech processing systems. Vibrators. Touch. Touch. fast Vibrators. fast Speech processing systems. fast Deaf fast
132	Non-acoustic speaker recognition Du Toit, Ilze 12 1900 (has links) Thesis (MScIng)--University of Stellenbosch, 2004. / ENGLISH ABSTRACT: In this study the phoneme labels derived from a phoneme recogniser are used for phonetic speaker recognition. The time-dependencies among phonemes are modelled by using hidden Markov models (HMMs) for the speaker models. Experiments are done using firstorder and second-order HMMs and various smoothing techniques are examined to address the problem of data scarcity. The use of word labels for lexical speaker recognition is also investigated. Single word frequencies are counted and the use of various word selections as feature sets are investigated. During April 2004, the University of Stellenbosch, in collaboration with Spescom DataVoice, participated in an international speaker verification competition presented by the National Institute of Standards and Technology (NIST). The University of Stellenbosch submitted phonetic and lexical (non-acoustic) speaker recognition systems and a fused system (the primary system) that fuses the acoustic system of Spescom DataVoice with the non-acoustic systems of the University of Stellenbosch. The results were evaluated by means of a cost model. Based on the cost model, the primary system obtained second and third position in the two categories that were submitted. / AFRIKAANSE OPSOMMING: Hierdie projek maak gebruik van foneem-etikette wat geklassifiseer word deur ’n foneemherkenner en daarna gebruik word vir fonetiese sprekerherkenning. Die tyd-afhanklikhede tussen foneme word gemodelleer deur gebruik te maak van verskuilde Markov modelle (HMMs) as sprekermodelle. Daar word ge¨eksperimenteer met eerste-orde en tweede-orde HMMs en verskeie vergladdingstegnieke word ondersoek om dataskaarsheid aan te spreek. Die gebruik van woord-etikette vir sprekerherkenning word ook ondersoek. Enkelwoordfrekwensies word getel en daar word ge¨eksperimenteer met verskeie woordseleksies as kenmerke vir sprekerherkenning. Gedurende April 2004 het die Universiteit van Stellenbosch in samewerking met Spescom DataVoice deelgeneem aan ’n internasionale sprekerverifikasie kompetisie wat deur die National Institute of Standards and Technology (NIST) aangebied is. Die Universiteit van Stellenbosch het ingeskryf vir ’n fonetiese en ’n woordgebaseerde (nie-akoestiese) sprekerherkenningstelsel, asook ’n saamgesmelte stelsel wat as primˆere stelsel dien. Die saamgesmelte stelsel is ’n kombinasie van Spescom DataVoice se akoestiese stelsel en die twee nie-akoestiese stelsels van die Universiteit van Stellenbosch. Die resultate is ge¨evalueer deur gebruik te maak van ’n koste-model. Op grond van die koste-model het die primˆere stelsel tweede en derde plek behaal in die twee kategorie¨e waaraan deelgeneem is. Automatic speech recognition Speech processing systems Speech perception Theses -- Electronic engineering Dissertations -- Electronic engineering
133	Wavelet-based speech enhancement : a statistical approach Harmse, Wynand 12 1900 (has links) Thesis (MScIng)--University of Stellenbosch, 2004. / ENGLISH ABSTRACT: Speech enhancement is the process of removing background noise from speech signals. The equivalent process for images is known as image denoising. While the Fourier transform is widely used for speech enhancement, image denoising typically uses the wavelet transform. Research on wavelet-based speech enhancement has only recently emerged, yet it shows promising results compared to Fourier-based methods. This research is enhanced by the availability of new wavelet denoising algorithms based on the statistical modelling of wavelet coefficients, such as the hidden Markov tree. The aim of this research project is to investigate wavelet-based speech enhancement from a statistical perspective. Current Fourier-based speech enhancement and its evaluation process are described, and a framework is created for wavelet-based speech enhancement. Several wavelet denoising algorithms are investigated, and it is found that the algorithms based on the statistical properties of speech in the wavelet domain outperform the classical and more heuristic denoising techniques. The choice of wavelet influences the quality of the enhanced speech and the effect of this choice is therefore examined. The introduction of a noise floor parameter also improves the perceptual quality of the wavelet-based enhanced speech, by masking annoying residual artifacts. The performance of wavelet-based speech enhancement is similar to that of the more widely used Fourier methods at low noise levels, with a slight difference in the residual artifact. At high noise levels, however, the Fourier methods are superior. / AFRIKAANSE OPSOMMING: Spraaksuiwering is die proses waardeur agtergrondgeraas uit spraakseine verwyder word. Die ekwivalente proses vir beelde word beeldsuiwering genoem. Terwyl spraaksuiwering in die algemeen in die Fourier-domein gedoen word, gebruik beeldsuiwering tipies die golfietransform. Navorsing oor golfie-gebaseerde spraaksuiwering het eers onlangs verskyn, en dit toon reeds belowende resultate in vergelyking met Fourier-gebaseerde metodes. Hierdie navorsingsveld word aangehelp deur die beskikbaarheid van nuwe golfie-gebaseerde suiweringstegnieke wat die golfie-ko¨effisi¨ente statisties modelleer, soos die verskuilde Markovboom. Die doel van hierdie navorsingsprojek is om golfie-gebaseerde spraaksuiwering vanuit ‘n statistiese oogpunt te bestudeer. Huidige Fourier-gebaseerde spraaksuiweringsmetodes asook die evalueringsproses vir sulke algoritmes word bespreek, en ‘n raamwerk word geskep vir golfie-gebaseerde spraaksuiwering. Verskeie golfie-gebaseerde algoritmes word ondersoek, en daar word gevind dat die metodes wat die statistiese eienskappe van spraak in die golfie-gebied gebruik, beter vaar as die klassieke en meer heuristiese metodes. Die keuse van golfie be¨ınvloed die kwaliteit van die gesuiwerde spraak, en die effek van hierdie keuse word dus ondersoek. Die gebruik van ‘n ruisvloer parameter verhoog ook die kwaliteit van die golfie-gesuiwerde spraak, deur steurende residuele artifakte te verberg. Die golfie-metodes vaar omtrent dieselfde as die klassieke Fourier-metodes by lae ruisvlakke, met ’n klein verskil in residuele artifakte. By ho¨e ruisvlakke vaar die Fouriermetodes egter steeds beter. Speech synthesis Speech processing systems Theses -- Electronic engineering Dissertations -- Electronic engineering
134	Tree-based Gaussian mixture models for speaker verification Cilliers, Francois Dirk 12 1900 (has links) Thesis (MScEng (Electrical and Electronic Engineering))--University of Stellenbosch, 2005. / The Gaussian mixture model (GMM) performs very effectively in applications such as speech and speaker recognition. However, evaluation speed is greatly reduced when the GMM has a large number of mixture components. Various techniques improve the evaluation speed by reducing the number of required Gaussian evaluations. Dissertations -- Electronic engineering Theses -- Electronic engineering Automatic speech recognition Speech processing systems Electrical and Electronic Engineering
135	The design of a high-performance, floating-point embedded system for speech recognition and audio research purposes Duckitt, William 03 1900 (has links) Thesis (MScEng (Electrical and Electronic Engineering))--Stellenbosch University, 2008. / This thesis describes the design of a high performance, floating-point, standalone embedded system that is appropriate for speech and audio processing purposes. The system successfully employs the Analog Devices TigerSHARC TS201 600MHz floating point digital signal processor as a CPU, and includes 512MB RAM, a Compact Flash storage card interface as non-volatile memory, a multi-channel audio input and output system with two programmable microphone preamplifiers offering up to 65dB gain, a USB interface, a LCD display and a push-button user interface. An Altera Cyclone II FPGA is used to interface the CPU with the various peripheral components. The FIFO buffers within the FPGA allow bulk DMA transfers of audio data for minimal processor delays. Similar approaches are taken for communication with the USB interface, the Compact Flash storage card and the LCD display. A logic analyzer interface allows system debugging via the FPGA. This interface can also in future be used to interface to additional components. The power distribution required a total of 11 different supplies to be provided with a total consumption of 16.8W. A 6 layer PCB incorporating 4 signal layers, a power plane and ground plane was designed for the final prototype. All system components were verified to be operating correctly by means of appropriate testing software, and the computational performance was measured by repeated calculation of a multi-dimensional Gaussian log-probability and found to be comparable with an Intel 1.8GHz Core2Duo processor. The design can therefore be considered a success, and the prototype is ready for development of suitable speech or audio processing software. Automatic speech recognition Speech processing systems Dissertations -- Electronic engineering Theses -- Electronic engineering Electrical and Electronic Engineering
136	Low bit rate speech coding Kritzinger, Carl 03 1900 (has links) Thesis (MScIng (Electrical and Electronic Engineering))--University of Stellenbosch, 2006. / Despite enormous advances in digital communication, the voice is still the primary tool with which people exchange ideas. However, uncompressed digital speech tends to require prohibitively high data rates (upward of 64kbps), making it impractical for many applications. Speech coding is the process of reducing the data rate of digital voice to manageable levels. Parametric speech coders or vocoders utilise a-priori information about the mechanism by which speech is produced in order to achieve extremely efficient compression of speech signals (as low as 1 kbps). The greater part of this thesis comprises an investigation into parametric speech coding. This consisted of a review of the mathematical and heuristic tools used in parametric speech coding, as well as the implementation of an accepted standard algorithm for parametric voice coding. In order to examine avenues of improvement for the existing vocoders, we examined some of the mathematical structure underlying parametric speech coding. Following on from this, we developed a novel approach to parametric speech coding which obtained promising results under both objective and subjective evaluation. An additional contribution by this thesis was the comparative subjective evaluation of the effect of parametric speech coding on English and Xhosa speech. We investigated the performance of two different encoding algorithms on the two languages. Dissertations -- Electronic engineering Theses -- Electronic engineering Speech processing systems Coding theory
137	Automatic alignment and error detection for phonetic transcriptions in the African speech technology project databases De Villiers, Edward 03 1900 (has links) Thesis (MScEng (Electrical and Electronic Engineering))--University of Stellenbosch, 2006. / The African Speech Technology (AST) project ran from 2000 to 2004 and involved collecting speech data for five South African languages, transcribing the data and building automatic speech recognition systems in these languages. The work described here form part of this project and involved implementing methods for automatic boundary placement in manually labelled files and for determining errors made by transcribers during the labelling process. Automatic speech recognition Speech processing systems Dissertations -- Electronic engineering Theses -- Electronic engineering Electrical and Electronic Engineering
138	Nonlinear Acoustic Echo Cancellation for Mobile Phones: A Practical Approach Fhager, Anders, Hussien, Jemal Mohammed January 2010 (has links) <p>Acoustic echo cancelation (AEC) composes a fundamental property of speech processing to enable a pleasant telecommunication conversation. Without this property of the telephone the communicator would hear an annoying echo of his own voice along with the speech from the other communicator. This would make a conversation through any telecommunication device an unpleasant experience.</p><p>AEC has been subject of interest since 1950s in the telecom industry and very efficient solutions were devised to cancel linear echo. With the advent of low cost hands free communication devices the issue of non linear echo became prominent because these devices use cheap loudspeakers that produce artifacts in addition to the desired sound which will cause non linear echo that cannot be cancelled by linear echo cancellers.</p><p>In this thesis a Harmonic Distortion Residual Echo Cancelation algorithm has been chosen for further investigations (HDRES). HDRES has many of those features that are desirable for an algorithm which is dealing with nonlinear acoustic echo cancelation, such as low computational complexity and fast convergence. The algorithm was first implemented in Matlab where it was tested and modified. The final result of the modified algorithm was then implemented in C and integrated with a complete AEC system. Before the implementation a number of measurements were done to distinguish the nonlinearities that were cause by the mobile phone loudspeaker. The measurements were performed on three different mobile pones which were documented to have problems with nonlinear acoustic echo.</p><p>The result of this thesis has shown that it might be possible to use an adaptive filter, which has both low complexity and fast convergence, in an operating AEC system. However, the request for such a system to work would be that a doubletalk detector is implemented along with the adaptive algorithm. That way the doubletalk situation could be found and the adaptation of the algorithm could be stopped. Thus, the major part of the speech would be saved.</p> nonlinear echo cancellation mobile phones speech processing acoustic echo RES volterra Hammerstien
139	Adaptive threshold optimisation for colour-based lip segmentation in automatic lip-reading systems Gritzman, Ashley Daniel January 2016 (has links) A thesis submitted to the Faculty of Engineering and the Built Environment, University of the Witwatersrand, Johannesburg, in ful lment of the requirements for the degree of Doctor of Philosophy. Johannesburg, September 2016 / Having survived the ordeal of a laryngectomy, the patient must come to terms with the resulting loss of speech. With recent advances in portable computing power, automatic lip-reading (ALR) may become a viable approach to voice restoration. This thesis addresses the image processing aspect of ALR, and focuses three contributions to colour-based lip segmentation. The rst contribution concerns the colour transform to enhance the contrast between the lips and skin. This thesis presents the most comprehensive study to date by measuring the overlap between lip and skin histograms for 33 di erent colour transforms. The hue component of HSV obtains the lowest overlap of 6:15%, and results show that selecting the correct transform can increase the segmentation accuracy by up to three times. The second contribution is the development of a new lip segmentation algorithm that utilises the best colour transforms from the comparative study. The algorithm is tested on 895 images and achieves percentage overlap (OL) of 92:23% and segmentation error (SE) of 7:39 %. The third contribution focuses on the impact of the histogram threshold on the segmentation accuracy, and introduces a novel technique called Adaptive Threshold Optimisation (ATO) to select a better threshold value. The rst stage of ATO incorporates -SVR to train the lip shape model. ATO then uses feedback of shape information to validate and optimise the threshold. After applying ATO, the SE decreases from 7:65% to 6:50%, corresponding to an absolute improvement of 1:15 pp or relative improvement of 15:1%. While this thesis concerns lip segmentation in particular, ATO is a threshold selection technique that can be used in various segmentation applications. / MT2017 Automatic speech recognition Speech processing systems Lipreading--Computer simulation Speech synthesis
140	The use of subword-based audio indexing in Chinese spoken document retrieval. January 2001 (has links) Li Yuk Chi. / Thesis (M.Phil.)--Chinese University of Hong Kong, 2001. / Includes bibliographical references (leaves [112]-119). / Abstracts in English and Chinese. / Abstract --- p.2 / List of Figures --- p.8 / List of Tables --- p.12 / Chapter 1 --- Introduction --- p.17 / Chapter 1.1 --- Information Retrieval --- p.18 / Chapter 1.1.1 --- Information Retrieval Models --- p.19 / Chapter 1.1.2 --- Information Retrieval in English --- p.20 / Chapter 1.1.3 --- Information Retrieval in Chinese --- p.22 / Chapter 1.2 --- Spoken Document Retrieval --- p.24 / Chapter 1.2.1 --- Spoken Document Retrieval in English --- p.25 / Chapter 1.2.2 --- Spoken Document Retrieval in Chinese --- p.25 / Chapter 1.3 --- Previous Work --- p.28 / Chapter 1.4 --- Motivation --- p.32 / Chapter 1.5 --- Goals --- p.33 / Chapter 1.6 --- Thesis Organization --- p.34 / Chapter 2 --- Investigation Framework --- p.35 / Chapter 2.1 --- Indexing the Spoken Document Collection --- p.36 / Chapter 2.2 --- Query Processing --- p.37 / Chapter 2.3 --- Subword Indexing --- p.37 / Chapter 2.4 --- Robustness in Chinese Spoken Document Retrieval --- p.40 / Chapter 2.5 --- Retrieval --- p.40 / Chapter 2.6 --- Evaluation --- p.43 / Chapter 2.6.1 --- Average Inverse Rank --- p.43 / Chapter 2.6.2 --- Mean Average Precision --- p.44 / Chapter 3 --- Subword-based Chinese Spoken Document Retrieval --- p.46 / Chapter 3.1 --- The Cantonese Corpus --- p.48 / Chapter 3.2 --- Known-Item Retrieval --- p.49 / Chapter 3.3 --- Subword Formulation for Cantonese Spoken Document Retrieval --- p.50 / Chapter 3.4 --- Audio Indexing by Cantonese Speech Recognition --- p.52 / Chapter 3.4.1 --- Seed Models from Adapted Data --- p.52 / Chapter 3.4.2 --- Retraining Acoustic Models --- p.53 / Chapter 3.5 --- The Retrieval Model --- p.55 / Chapter 3.6 --- Experiments --- p.56 / Chapter 3.6.1 --- Setup and Observations --- p.57 / Chapter 3.6.2 --- Results Analysis --- p.58 / Chapter 3.7 --- Chapter Summary --- p.63 / Chapter 4 --- Robust Indexing and Retrieval Methods --- p.64 / Chapter 4.1 --- Query Expansion using Phonetic Confusion --- p.65 / Chapter 4.1.1 --- Syllable-Syllable Confusions from Recognition --- p.66 / Chapter 4.1.2 --- Experimental Setup and Observation --- p.67 / Chapter 4.2 --- Document Expansion --- p.71 / Chapter 4.2.1 --- The Side Collection for Expansion --- p.72 / Chapter 4.2.2 --- Detailed Procedures in Document Expansion --- p.72 / Chapter 4.2.3 --- Improvements due to Document Expansion --- p.73 / Chapter 4.3 --- Using both Query and Document Expansion --- p.75 / Chapter 4.4 --- Chapter Summary --- p.76 / Chapter 5 --- Cross-Language Spoken Document Retrieval --- p.78 / Chapter 5.1 --- The Topic Detection and Tracking Collection --- p.80 / Chapter 5.1.1 --- The Spoken Document Collection --- p.81 / Chapter 5.1.2 --- The Translingual Query --- p.82 / Chapter 5.1.3 --- The Side Collection --- p.82 / Chapter 5.1.4 --- Subword-based Indexing --- p.83 / Chapter 5.2 --- The Translingual Retrieval Task --- p.83 / Chapter 5.3 --- Machine Translated Query --- p.85 / Chapter 5.3.1 --- The Unbalanced Query --- p.85 / Chapter 5.3.2 --- The Balanced Query --- p.87 / Chapter 5.3.3 --- Results on the Weight Balancing Scheme --- p.88 / Chapter 5.4 --- Document Expansion from a Side Collection --- p.89 / Chapter 5.5 --- Performance Evaluation and Analysis --- p.91 / Chapter 5.6 --- Chapter Summary --- p.93 / Chapter 6 --- Summary and Future Work --- p.95 / Chapter 6.1 --- Future Directions --- p.97 / Chapter A --- Input format for the IR engine --- p.101 / Chapter B --- Preliminary Results on the Two Normalization Schemes --- p.102 / Chapter C --- Significance Tests --- p.103 / Chapter C.1 --- Query Expansions for Cantonese Spoken Document Retrieval --- p.103 / Chapter C.2 --- Document Expansion for Cantonese Spoken Document Retrieval --- p.105 / Chapter C.3 --- Balanced Query for Cross-Language Spoken Document Retrieval --- p.107 / Chapter C.4 --- Document Expansion for Cross-Language Spoken Document Retrieval --- p.107 / Chapter D --- The Use of an Unrelated Source for Expanding Spoken Doc- uments in Cantonese --- p.110 / Bibliography --- p.110 Automatic speech recognition Speech processing systems Cantonese dialects--Data processing Cantonese dialects--Phonology Information retrieval

Search results