• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 1
  • Tagged with
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Combining acoustic analysis and phonotactic analysis to improve automatic speech recognition

Nulsen, Susan, n/a January 1998 (has links)
This thesis addresses the problem of automatic speech recognition, specifically, how to transform an acoustic waveform into a string of words or phonemes. A preliminary chapter gives linguistic information potentially useful in automatic speech recognition. This is followed by a description of the Wave Analysis Laboratory (WAL), a rule-based system which detects features in speech and was designed as the acoustic front end of a speech recognition system. Temporal reasoning as used in WAL rules is examined. The use of WAL in recognizing one particular class of speech sounds, the nasal consonants, is described in detail. The remainder of the thesis looks at the statistical analysis of samples of spontaneous speech. An orthographic transcription of a large sample of spontaneous speech is automatically translated into phonemes. Tables of the frequencies of word initial and word final phoneme clusters are constructed to illustrate some of the phonotactic constraints of the language. Statistical data is used to assign phonemes to phonotactic classes. These classes are unlike the acoustic classes, although there is a general distinction between the vowels, the consonants and the word boundary. A way of measuring the phonetic balance of a sample of speech is described. This can be used as a means of ranking potential test samples in terms of how well they represent the language. A phoneme n-gram model is used to measure the entropy of the language. The broad acoustic encoding output from WAL is used with this language model to reconstruct a small test sample. "Branching" a simpler alternative to perplexity is introduced and found to give similar results to perplexity. Finally, the drop in branching is calculated as knowledge of various sets of acoustic classes is considered. In the work described in this thesis the main contributions made to automatic speech recognition and the study of speech are in the development of the Wave Analysis Laboratory and in the analysis of speech from a phonotactic point of view. The phoneme cluster frequencies provide new information on spoken language, as do the phonotactic classes. The measures of phonetic balance and branching provide additional tools for use in the development of speech recognition systems.

Page generated in 0.0686 seconds