This thesis addresses the problem of automatic speech recognition, specifically, how
to transform an acoustic waveform into a string of words or phonemes. A preliminary
chapter gives linguistic information potentially useful in automatic speech
recognition. This is followed by a description of the Wave Analysis Laboratory
(WAL), a rule-based system which detects features in speech and was designed as
the acoustic front end of a speech recognition system. Temporal reasoning as used
in WAL rules is examined. The use of WAL in recognizing one particular class of
speech sounds, the nasal consonants, is described in detail.
The remainder of the thesis looks at the statistical analysis of samples of spontaneous
speech. An orthographic transcription of a large sample of spontaneous
speech is automatically translated into phonemes. Tables of the frequencies of
word initial and word final phoneme clusters are constructed to illustrate some
of the phonotactic constraints of the language. Statistical data is used to assign
phonemes to phonotactic classes. These classes are unlike the acoustic classes,
although there is a general distinction between the vowels, the consonants and the
word boundary.
A way of measuring the phonetic balance of a sample of speech is described. This
can be used as a means of ranking potential test samples in terms of how well they
represent the language.
A phoneme n-gram model is used to measure the entropy of the language. The
broad acoustic encoding output from WAL is used with this language model to
reconstruct a small test sample.
"Branching" a simpler alternative to perplexity is introduced and found to give
similar results to perplexity. Finally, the drop in branching is calculated as knowledge
of various sets of acoustic classes is considered.
In the work described in this thesis the main contributions made to automatic
speech recognition and the study of speech are in the development of the Wave
Analysis Laboratory and in the analysis of speech from a phonotactic point of view.
The phoneme cluster frequencies provide new information on spoken language,
as do the phonotactic classes. The measures of phonetic balance and branching
provide additional tools for use in the development of speech recognition systems.
Identifer | oai:union.ndltd.org:ADTP/219168 |
Date | January 1998 |
Creators | Nulsen, Susan, n/a |
Publisher | University of Canberra. Information Sciences & Engineering |
Source Sets | Australiasian Digital Theses Program |
Language | English |
Detected Language | English |
Rights | ), Copyright Susan Nulsen |
Page generated in 0.1728 seconds