221 |
USB telephony interface device for speech recognition applications /Müller, J. J. January 2005 (has links)
Thesis (MSc)--University of Stellenbosch, 2005. / Bibliography. Also available via the Internet.
|
222 |
Non [sic] linear adaptive filters for echo cancellation of speech coded signalsKulakcherla, Sudheer. January 2004 (has links)
Thesis (M.S.)--University of Missouri-Columbia, 2004. / Typescript. Includes bibliographical references (leaves 116-117). Also available on the Internet.
|
223 |
Speech enhancement using microphone arrayCho, Jaeyoun, January 2005 (has links)
Thesis (Ph. D.)--Ohio State University, 2005. / Title from first page of PDF file. Includes bibliographical references (p. 114-117).
|
224 |
Acoustic characteristics of stop consonants: a controlled study.Zue, V. W. (Victor Waito) January 1976 (has links)
Thesis (Sc. D.)—Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1976. / Includes bibliographical references (p. 146-149). / This electronic version was scanned from a copy of the thesis on file at the Speech Communication Group. The certified thesis is available in the Institute Archives and Special Collections
|
225 |
Determining articulator configuration in voiced stop consonants by matching time-domain patterns in pitch periodsKondacs, Attila 28 January 2005 (has links)
In this thesis I will be concerned with linking the observed speechsignal to the configuration of articulators.Due to the potentially rapid motion of the articulators, the speechsignal can be highly non-stationary. The typical linear analysistechniques that assume quasi-stationarity may not have sufficienttime-frequency resolution to determine the place of articulation.I argue that the traditional low and high-level primitives of speechprocessing, frequency and phonemes, are inadequate and should bereplaced by a representation with three layers: 1. short pitch periodresonances and other spatio-temporal patterns 2. articulatorconfiguration trajectories 3. syllables. The patterns indicatearticulator configuration trajectories (how the tongue, jaws, etc. aremoving), which are interpreted as syllables and words.My patterns are an alternative to frequency. I use shorttime-domain features of the sound waveform, which can be extractedfrom each vowel pitch period pattern, to identify the positions of thearticulators with high reliability. These features are importantbecause by capitalizing on detailed measurements within a single pitchperiod, the rapid articulator movements can be tracked. No linearsignal processing approach can achieve the combination of sensitivityto short term changes and measurement accuracy resulting from thesenonlinear techniques.The measurements I use are neurophysiologically plausible: theauditory system could be using similar methods.I have demonstrated this approach by constructing a robust techniquefor categorizing the English voiced stops as the consonants B, D, or Gbased on the vocalic portions of their releases. The classificationrecognizes 93.5%, 81.8% and 86.1% of the b, d and gto ae transitions with false positive rates 2.9%, 8.7% and2.6% respectively.
|
226 |
Automatic syllabification of untranscribed speechNel, Pieter Willem 03 1900 (has links)
Thesis (MScEng)--Stellenbosch University, 2005. / ENGLISH ABSTRACT: The syllable has been proposed as a unit of automatic speech recognition due to its
strong links with human speech production and perception. Recently, it has been proved
that incorporating information from syllable-length time-scales into automatic speech
recognition improves results in large vocabulary recognition tasks. It was also shown to
aid in various language recognition tasks and in foreign accent identification. Therefore,
the ability to automatically segment speech into syllables is an important research tool.
Where most previous studies employed knowledge-based methods, this study presents a
purely statistical method for the automatic syllabification of speech.
We introduce the concept of hierarchical hidden Markov model structures and show
how these can be used to implement a purely acoustical syllable segmenter based, on
general sonority theory, combined with some of the phonotactic constraints found in the
English language.
The accurate reporting of syllabification results is a problem in the existing literature.
We present a well-defined dynamic time warping (DTW) distance measure used for
reporting syllabification results.
We achieve a token error rate of 20.3% with a 42ms average boundary error on a
relatively large set of data. This compares well with previous knowledge-based and
statistically- based methods. / AFRIKAANSE OPSOMMING: Die syllabe is voorheen voorgestel as 'n basiese eenheid vir automatiese spraakherkenning
weens die sterk verwantwskap wat dit het met spraak produksie en persepsie. Onlangs
is dit bewys dat die gebruik van informasie van syllabe-lengte tydskale die resultate
verbeter in groot woordeskat herkennings take. Dit is ook bewys dat die gebruik van
syllabes automatiese taalherkenning en vreemdetaal aksent herkenning vergemaklik. Dit
is daarom belangrik om vir navorsingsdoeleindes syllabes automaties te kan segmenteer.
Vorige studies het kennisgebaseerde metodes gebruik om hierdie segmentasie te bewerkstellig.
Hierdie studie gebruik 'n suiwer statistiese metode vir die automatiese syllabifikasie
van spraak.
Ons gebruik die konsep van hierargiese verskuilde Markov model strukture en wys hoe
dit gebruik kan word om 'n suiwer akoestiese syllabe segmenteerder te implementeer. Die
model word gebou deur dit te baseer op die teorie van sonoriteit asook die fonotaktiese
beperkinge teenwoordig in die Engelse taal.
Die akkurate voorstelling van syllabifikasie resultate is problematies in die bestaande
literatuur. Ons definieer volledig 'n DTW (Dynamic Time Warping) afstands funksie
waarmee ons ons syllabifikasie resultate weergee.
Ons behaal 'n TER (Token Error Rate) van 20.3% met 'n 42ms gemiddelde grens
fout op 'n relatiewe groot stel data. Dit vergelyk goed met vorige kennis-gebaseerde en
statisties-gebaseerde metodes.
|
227 |
Enkelsybanddemodulasie met behulp van syferseinverwerkingKruger, Johannes Petrus 12 June 2014 (has links)
M.Ing. (Electrical and Electronic Engineering) / The feasibility of modulation and demodulation of speech signals within a microprocessor is invertigated in the following study. Existing modulation and demodulation techniques are investigated and new techniques. suitable for microprocessor implementation, described. Finally a single sideband demodulator was built using the TMS32010 microprocessor with results being better or comparable than existing analog techniques.
|
228 |
A rule-based system to automatically segment and label continuous speech of known text /Boissonneault, Paul G. January 1984 (has links)
No description available.
|
229 |
Vector quantization in residual-encoded linear prediction of speechAbramson, Mark January 1983 (has links)
No description available.
|
230 |
A Study in Speaker Dependent Medium Vocabulary Word Recognition: Application to Human/Computer InterfaceAbdallah, Moatassem Mahmoud 05 February 2000 (has links)
Human interfaces to computers continue to be an active area of research. The keyboard is considered the basic interface for editing control as well as text input. Problems of correct typing and typing speed have urged research for alternative means for keyboard replacement, or at least "resizing" its monopoly. Pointing devices (e.g. a mouse) have been developed, and supporting software with icons is now widely used. Two other means are being developed and operationally tested, namely, the pen for handwriting text, commands and drawings, and spoken language interface, which is the subject of this thesis.
Human/computer interface is an interactive man-machine communication facility that enjoys the following advantages.
• High input speed: some experiments reveal that the rate of information input by speech is three times faster than keyboard input and eight times faster than inputting characters by hand.
• No training needed: because the generation of speech is a very natural human action, it requires no special training.
• Parallel processing with other information: production of speech works quite well in conjunction with gestures of hands and feet for visual perception of information.
• Simple and economical input sensor: microphones are inexpensive and are readily available.
• Coping with handicaps: these interfaces can be used in unusual circumstances of darkness, blindness, or other visual handicap.
This dissertation presents a design of a Human Computer Interface (HCI) system that can be trained to work with an individual speaker. A new approach is introduced to extract key voice features, called Median Linear Predictive Coding (MLPC). MLPC reduces the HCI calculation time and gives an improved recognition rate. This design eliminated the typical Multi-layer Perceptron (MLP) problems of complexity growth with vocabulary size, the large training times required and the need for complete re-training whenever the vocabulary is extended. A novel modular neural network architecture, called a Pyramidal Modular Neural Network (PMNN), is introduced for recursive speech identification. In addition, many other system algorithms/components, such as speech endpoint detection, automatic noise thresholding, etc., must be tailored correctly in order to achieve high recognition accuracy. / Ph. D.
|
Page generated in 0.0986 seconds