Spelling suggestions: "subject:"[een] SPEECH PROCESSING"" "subject:"[enn] SPEECH PROCESSING""
131 |
Evaluation of two tactile speech displaysClements, Mark Andrew. January 1978 (has links)
Thesis: Elec. E., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 1978 / Bibliography: leaves 57-59. / by Mark Andrew Clements. / Elec. E. / Elec. E. Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science
|
132 |
Non-acoustic speaker recognitionDu Toit, Ilze 12 1900 (has links)
Thesis (MScIng)--University of Stellenbosch, 2004. / ENGLISH ABSTRACT: In this study the phoneme labels derived from a phoneme recogniser are used for phonetic
speaker recognition. The time-dependencies among phonemes are modelled by using
hidden Markov models (HMMs) for the speaker models. Experiments are done using firstorder
and second-order HMMs and various smoothing techniques are examined to address
the problem of data scarcity. The use of word labels for lexical speaker recognition is also
investigated. Single word frequencies are counted and the use of various word selections
as feature sets are investigated. During April 2004, the University of Stellenbosch, in collaboration
with Spescom DataVoice, participated in an international speaker verification
competition presented by the National Institute of Standards and Technology (NIST). The
University of Stellenbosch submitted phonetic and lexical (non-acoustic) speaker recognition
systems and a fused system (the primary system) that fuses the acoustic system of
Spescom DataVoice with the non-acoustic systems of the University of Stellenbosch. The
results were evaluated by means of a cost model. Based on the cost model, the primary
system obtained second and third position in the two categories that were submitted. / AFRIKAANSE OPSOMMING: Hierdie projek maak gebruik van foneem-etikette wat geklassifiseer word deur ’n foneemherkenner
en daarna gebruik word vir fonetiese sprekerherkenning. Die tyd-afhanklikhede
tussen foneme word gemodelleer deur gebruik te maak van verskuilde Markov modelle
(HMMs) as sprekermodelle. Daar word ge¨eksperimenteer met eerste-orde en tweede-orde
HMMs en verskeie vergladdingstegnieke word ondersoek om dataskaarsheid aan te spreek.
Die gebruik van woord-etikette vir sprekerherkenning word ook ondersoek. Enkelwoordfrekwensies
word getel en daar word ge¨eksperimenteer met verskeie woordseleksies as kenmerke
vir sprekerherkenning. Gedurende April 2004 het die Universiteit van Stellenbosch
in samewerking met Spescom DataVoice deelgeneem aan ’n internasionale sprekerverifikasie
kompetisie wat deur die National Institute of Standards and Technology (NIST)
aangebied is. Die Universiteit van Stellenbosch het ingeskryf vir ’n fonetiese en ’n woordgebaseerde
(nie-akoestiese) sprekerherkenningstelsel, asook ’n saamgesmelte stelsel wat as
primˆere stelsel dien. Die saamgesmelte stelsel is ’n kombinasie van Spescom DataVoice se
akoestiese stelsel en die twee nie-akoestiese stelsels van die Universiteit van Stellenbosch.
Die resultate is ge¨evalueer deur gebruik te maak van ’n koste-model. Op grond van die
koste-model het die primˆere stelsel tweede en derde plek behaal in die twee kategorie¨e
waaraan deelgeneem is.
|
133 |
Wavelet-based speech enhancement : a statistical approachHarmse, Wynand 12 1900 (has links)
Thesis (MScIng)--University of Stellenbosch, 2004. / ENGLISH ABSTRACT: Speech enhancement is the process of removing background noise from speech signals. The
equivalent process for images is known as image denoising. While the Fourier transform is
widely used for speech enhancement, image denoising typically uses the wavelet transform.
Research on wavelet-based speech enhancement has only recently emerged, yet it shows
promising results compared to Fourier-based methods. This research is enhanced by the
availability of new wavelet denoising algorithms based on the statistical modelling of
wavelet coefficients, such as the hidden Markov tree.
The aim of this research project is to investigate wavelet-based speech enhancement from
a statistical perspective. Current Fourier-based speech enhancement and its evaluation
process are described, and a framework is created for wavelet-based speech enhancement.
Several wavelet denoising algorithms are investigated, and it is found that the algorithms
based on the statistical properties of speech in the wavelet domain outperform the classical
and more heuristic denoising techniques. The choice of wavelet influences the quality of the
enhanced speech and the effect of this choice is therefore examined. The introduction of a
noise floor parameter also improves the perceptual quality of the wavelet-based enhanced
speech, by masking annoying residual artifacts. The performance of wavelet-based speech
enhancement is similar to that of the more widely used Fourier methods at low noise
levels, with a slight difference in the residual artifact. At high noise levels, however, the
Fourier methods are superior. / AFRIKAANSE OPSOMMING: Spraaksuiwering is die proses waardeur agtergrondgeraas uit spraakseine verwyder word.
Die ekwivalente proses vir beelde word beeldsuiwering genoem. Terwyl spraaksuiwering in
die algemeen in die Fourier-domein gedoen word, gebruik beeldsuiwering tipies die golfietransform.
Navorsing oor golfie-gebaseerde spraaksuiwering het eers onlangs verskyn, en
dit toon reeds belowende resultate in vergelyking met Fourier-gebaseerde metodes. Hierdie
navorsingsveld word aangehelp deur die beskikbaarheid van nuwe golfie-gebaseerde suiweringstegnieke
wat die golfie-ko¨effisi¨ente statisties modelleer, soos die verskuilde Markovboom.
Die doel van hierdie navorsingsprojek is om golfie-gebaseerde spraaksuiwering vanuit ‘n
statistiese oogpunt te bestudeer. Huidige Fourier-gebaseerde spraaksuiweringsmetodes
asook die evalueringsproses vir sulke algoritmes word bespreek, en ‘n raamwerk word
geskep vir golfie-gebaseerde spraaksuiwering. Verskeie golfie-gebaseerde algoritmes word
ondersoek, en daar word gevind dat die metodes wat die statistiese eienskappe van spraak
in die golfie-gebied gebruik, beter vaar as die klassieke en meer heuristiese metodes. Die
keuse van golfie be¨ınvloed die kwaliteit van die gesuiwerde spraak, en die effek van hierdie
keuse word dus ondersoek. Die gebruik van ‘n ruisvloer parameter verhoog ook
die kwaliteit van die golfie-gesuiwerde spraak, deur steurende residuele artifakte te verberg.
Die golfie-metodes vaar omtrent dieselfde as die klassieke Fourier-metodes by lae
ruisvlakke, met ’n klein verskil in residuele artifakte. By ho¨e ruisvlakke vaar die Fouriermetodes
egter steeds beter.
|
134 |
Tree-based Gaussian mixture models for speaker verificationCilliers, Francois Dirk 12 1900 (has links)
Thesis (MScEng (Electrical and Electronic Engineering))--University of Stellenbosch, 2005. / The Gaussian mixture model (GMM) performs very effectively in applications
such as speech and speaker recognition. However, evaluation speed is greatly
reduced when the GMM has a large number of mixture components. Various
techniques improve the evaluation speed by reducing the number of required
Gaussian evaluations.
|
135 |
The design of a high-performance, floating-point embedded system for speech recognition and audio research purposesDuckitt, William 03 1900 (has links)
Thesis (MScEng (Electrical and Electronic Engineering))--Stellenbosch University, 2008. / This thesis describes the design of a high performance, floating-point, standalone embedded
system that is appropriate for speech and audio processing purposes.
The system successfully employs the Analog Devices TigerSHARC TS201 600MHz floating
point digital signal processor as a CPU, and includes 512MB RAM, a Compact Flash storage card
interface as non-volatile memory, a multi-channel audio input and output system with two
programmable microphone preamplifiers offering up to 65dB gain, a USB interface, a LCD display
and a push-button user interface.
An Altera Cyclone II FPGA is used to interface the CPU with the various peripheral
components. The FIFO buffers within the FPGA allow bulk DMA transfers of audio data for minimal
processor delays. Similar approaches are taken for communication with the USB interface, the
Compact Flash storage card and the LCD display.
A logic analyzer interface allows system debugging via the FPGA. This interface can also in
future be used to interface to additional components. The power distribution required a total of 11
different supplies to be provided with a total consumption of 16.8W. A 6 layer PCB incorporating 4
signal layers, a power plane and ground plane was designed for the final prototype.
All system components were verified to be operating correctly by means of appropriate
testing software, and the computational performance was measured by repeated calculation of a
multi-dimensional Gaussian log-probability and found to be comparable with an Intel 1.8GHz
Core2Duo processor.
The design can therefore be considered a success, and the prototype is ready for
development of suitable speech or audio processing software.
|
136 |
Low bit rate speech codingKritzinger, Carl 03 1900 (has links)
Thesis (MScIng (Electrical and Electronic Engineering))--University of Stellenbosch, 2006. / Despite enormous advances in digital communication, the voice is still the primary tool
with which people exchange ideas. However, uncompressed digital speech tends to require
prohibitively high data rates (upward of 64kbps), making it impractical for many applications.
Speech coding is the process of reducing the data rate of digital voice to manageable
levels. Parametric speech coders or vocoders utilise a-priori information about the mechanism
by which speech is produced in order to achieve extremely efficient compression of
speech signals (as low as 1 kbps).
The greater part of this thesis comprises an investigation into parametric speech coding.
This consisted of a review of the mathematical and heuristic tools used in parametric
speech coding, as well as the implementation of an accepted standard algorithm for parametric
voice coding.
In order to examine avenues of improvement for the existing vocoders, we examined
some of the mathematical structure underlying parametric speech coding. Following on
from this, we developed a novel approach to parametric speech coding which obtained
promising results under both objective and subjective evaluation.
An additional contribution by this thesis was the comparative subjective evaluation of
the effect of parametric speech coding on English and Xhosa speech. We investigated the
performance of two different encoding algorithms on the two languages.
|
137 |
Automatic alignment and error detection for phonetic transcriptions in the African speech technology project databasesDe Villiers, Edward 03 1900 (has links)
Thesis (MScEng (Electrical and Electronic Engineering))--University of Stellenbosch, 2006. / The African Speech Technology (AST) project ran from 2000 to 2004 and involved collecting speech data for five South African languages, transcribing the data and building automatic speech recognition systems in these languages. The work described here form part of this project and involved implementing methods for automatic boundary placement in manually labelled files and for determining errors made by transcribers during the labelling process.
|
138 |
Nonlinear Acoustic Echo Cancellation for Mobile Phones: A Practical ApproachFhager, Anders, Hussien, Jemal Mohammed January 2010 (has links)
<p>Acoustic echo cancelation (AEC) composes a fundamental property of speech processing to enable a pleasant telecommunication conversation. Without this property of the telephone the communicator would hear an annoying echo of his own voice along with the speech from the other communicator. This would make a conversation through any telecommunication device an unpleasant experience.</p><p>AEC has been subject of interest since 1950s in the telecom industry and very efficient solutions were devised to cancel linear echo. With the advent of low cost hands free communication devices the issue of non linear echo became prominent because these devices use cheap loudspeakers that produce artifacts in addition to the desired sound which will cause non linear echo that cannot be cancelled by linear echo cancellers.</p><p>In this thesis a Harmonic Distortion Residual Echo Cancelation algorithm has been chosen for further investigations (HDRES). HDRES has many of those features that are desirable for an algorithm which is dealing with nonlinear acoustic echo cancelation, such as low computational complexity and fast convergence. The algorithm was first implemented in Matlab where it was tested and modified. The final result of the modified algorithm was then implemented in C and integrated with a complete AEC system. Before the implementation a number of measurements were done to distinguish the nonlinearities that were cause by the mobile phone loudspeaker. The measurements were performed on three different mobile pones which were documented to have problems with nonlinear acoustic echo.</p><p>The result of this thesis has shown that it might be possible to use an adaptive filter, which has both low complexity and fast convergence, in an operating AEC system. However, the request for such a system to work would be that a doubletalk detector is implemented along with the adaptive algorithm. That way the doubletalk situation could be found and the adaptation of the algorithm could be stopped. Thus, the major part of the speech would be saved.</p>
|
139 |
Adaptive threshold optimisation for colour-based lip segmentation in automatic lip-reading systemsGritzman, Ashley Daniel January 2016 (has links)
A thesis submitted to the Faculty of Engineering and the Built Environment,
University of the Witwatersrand, Johannesburg, in ful lment of the requirements for
the degree of Doctor of Philosophy.
Johannesburg, September 2016 / Having survived the ordeal of a laryngectomy, the patient must come to terms with
the resulting loss of speech. With recent advances in portable computing power,
automatic lip-reading (ALR) may become a viable approach to voice restoration. This
thesis addresses the image processing aspect of ALR, and focuses three contributions
to colour-based lip segmentation.
The rst contribution concerns the colour transform to enhance the contrast
between the lips and skin. This thesis presents the most comprehensive study to
date by measuring the overlap between lip and skin histograms for 33 di erent
colour transforms. The hue component of HSV obtains the lowest overlap of 6:15%,
and results show that selecting the correct transform can increase the segmentation
accuracy by up to three times.
The second contribution is the development of a new lip segmentation algorithm
that utilises the best colour transforms from the comparative study. The algorithm
is tested on 895 images and achieves percentage overlap (OL) of 92:23% and segmentation
error (SE) of 7:39 %.
The third contribution focuses on the impact of the histogram threshold on the
segmentation accuracy, and introduces a novel technique called Adaptive Threshold
Optimisation (ATO) to select a better threshold value. The rst stage of ATO
incorporates -SVR to train the lip shape model. ATO then uses feedback of shape
information to validate and optimise the threshold. After applying ATO, the SE
decreases from 7:65% to 6:50%, corresponding to an absolute improvement of 1:15 pp
or relative improvement of 15:1%. While this thesis concerns lip segmentation in
particular, ATO is a threshold selection technique that can be used in various
segmentation applications. / MT2017
|
140 |
The use of subword-based audio indexing in Chinese spoken document retrieval.January 2001 (has links)
Li Yuk Chi. / Thesis (M.Phil.)--Chinese University of Hong Kong, 2001. / Includes bibliographical references (leaves [112]-119). / Abstracts in English and Chinese. / Abstract --- p.2 / List of Figures --- p.8 / List of Tables --- p.12 / Chapter 1 --- Introduction --- p.17 / Chapter 1.1 --- Information Retrieval --- p.18 / Chapter 1.1.1 --- Information Retrieval Models --- p.19 / Chapter 1.1.2 --- Information Retrieval in English --- p.20 / Chapter 1.1.3 --- Information Retrieval in Chinese --- p.22 / Chapter 1.2 --- Spoken Document Retrieval --- p.24 / Chapter 1.2.1 --- Spoken Document Retrieval in English --- p.25 / Chapter 1.2.2 --- Spoken Document Retrieval in Chinese --- p.25 / Chapter 1.3 --- Previous Work --- p.28 / Chapter 1.4 --- Motivation --- p.32 / Chapter 1.5 --- Goals --- p.33 / Chapter 1.6 --- Thesis Organization --- p.34 / Chapter 2 --- Investigation Framework --- p.35 / Chapter 2.1 --- Indexing the Spoken Document Collection --- p.36 / Chapter 2.2 --- Query Processing --- p.37 / Chapter 2.3 --- Subword Indexing --- p.37 / Chapter 2.4 --- Robustness in Chinese Spoken Document Retrieval --- p.40 / Chapter 2.5 --- Retrieval --- p.40 / Chapter 2.6 --- Evaluation --- p.43 / Chapter 2.6.1 --- Average Inverse Rank --- p.43 / Chapter 2.6.2 --- Mean Average Precision --- p.44 / Chapter 3 --- Subword-based Chinese Spoken Document Retrieval --- p.46 / Chapter 3.1 --- The Cantonese Corpus --- p.48 / Chapter 3.2 --- Known-Item Retrieval --- p.49 / Chapter 3.3 --- Subword Formulation for Cantonese Spoken Document Retrieval --- p.50 / Chapter 3.4 --- Audio Indexing by Cantonese Speech Recognition --- p.52 / Chapter 3.4.1 --- Seed Models from Adapted Data --- p.52 / Chapter 3.4.2 --- Retraining Acoustic Models --- p.53 / Chapter 3.5 --- The Retrieval Model --- p.55 / Chapter 3.6 --- Experiments --- p.56 / Chapter 3.6.1 --- Setup and Observations --- p.57 / Chapter 3.6.2 --- Results Analysis --- p.58 / Chapter 3.7 --- Chapter Summary --- p.63 / Chapter 4 --- Robust Indexing and Retrieval Methods --- p.64 / Chapter 4.1 --- Query Expansion using Phonetic Confusion --- p.65 / Chapter 4.1.1 --- Syllable-Syllable Confusions from Recognition --- p.66 / Chapter 4.1.2 --- Experimental Setup and Observation --- p.67 / Chapter 4.2 --- Document Expansion --- p.71 / Chapter 4.2.1 --- The Side Collection for Expansion --- p.72 / Chapter 4.2.2 --- Detailed Procedures in Document Expansion --- p.72 / Chapter 4.2.3 --- Improvements due to Document Expansion --- p.73 / Chapter 4.3 --- Using both Query and Document Expansion --- p.75 / Chapter 4.4 --- Chapter Summary --- p.76 / Chapter 5 --- Cross-Language Spoken Document Retrieval --- p.78 / Chapter 5.1 --- The Topic Detection and Tracking Collection --- p.80 / Chapter 5.1.1 --- The Spoken Document Collection --- p.81 / Chapter 5.1.2 --- The Translingual Query --- p.82 / Chapter 5.1.3 --- The Side Collection --- p.82 / Chapter 5.1.4 --- Subword-based Indexing --- p.83 / Chapter 5.2 --- The Translingual Retrieval Task --- p.83 / Chapter 5.3 --- Machine Translated Query --- p.85 / Chapter 5.3.1 --- The Unbalanced Query --- p.85 / Chapter 5.3.2 --- The Balanced Query --- p.87 / Chapter 5.3.3 --- Results on the Weight Balancing Scheme --- p.88 / Chapter 5.4 --- Document Expansion from a Side Collection --- p.89 / Chapter 5.5 --- Performance Evaluation and Analysis --- p.91 / Chapter 5.6 --- Chapter Summary --- p.93 / Chapter 6 --- Summary and Future Work --- p.95 / Chapter 6.1 --- Future Directions --- p.97 / Chapter A --- Input format for the IR engine --- p.101 / Chapter B --- Preliminary Results on the Two Normalization Schemes --- p.102 / Chapter C --- Significance Tests --- p.103 / Chapter C.1 --- Query Expansions for Cantonese Spoken Document Retrieval --- p.103 / Chapter C.2 --- Document Expansion for Cantonese Spoken Document Retrieval --- p.105 / Chapter C.3 --- Balanced Query for Cross-Language Spoken Document Retrieval --- p.107 / Chapter C.4 --- Document Expansion for Cross-Language Spoken Document Retrieval --- p.107 / Chapter D --- The Use of an Unrelated Source for Expanding Spoken Doc- uments in Cantonese --- p.110 / Bibliography --- p.110
|
Page generated in 0.0476 seconds