• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 2
  • Tagged with
  • 3
  • 3
  • 3
  • 2
  • 2
  • 2
  • 2
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Production Knowledge in the Recognition of Dysarthric Speech

Rudzicz, Frank 31 August 2011 (has links)
Millions of individuals have acquired or have been born with neuro-motor conditions that limit the control of their muscles, including those that manipulate the articulators of the vocal tract. These conditions, collectively called dysarthria, result in speech that is very difficult to understand, despite being generally syntactically and semantically correct. This difficulty is not limited to human listeners, but also adversely affects the performance of traditional automatic speech recognition (ASR) systems, which in some cases can be completely unusable by the affected individual. This dissertation describes research into improving ASR for speakers with dysarthria by means of incorporated knowledge of their speech production. The document first introduces theoretical aspects of dysarthria and of speech production and outlines related work in these combined areas within ASR. It then describes the acquisition and analysis of the TORGO database of dysarthric articulatory motion and demonstrates several consistent behaviours among speakers in this database, including predictable pronunciation errors, for example. Articulatory data are then used to train augmented ASR systems that model the statistical relationships between vocal tract configurations and their acoustic consequences. I show that dynamic Bayesian networks augmented with instantaneous theoretical or empirical articulatory variables outperform even discriminative alternatives. This leads to work that incorporates a more rigid theory of speech production, i.e., task-dynamics, that models the high-level and long-term aspects of speech production. For this task, I devised an algorithm for estimating articulatory positions given only acoustics that significantly outperforms the state-of-the-art. Finally, I present ongoing work into the transformation and re-synthesis of dysarthric speech in order to make it more intelligible to human listeners. This research represents definitive progress towards the accommodation of dysarthric speech within modern speech recognition systems. However, there is much more research that remains to be undertaken and I conclude with some thoughts as to which paths we might now take.
2

Production Knowledge in the Recognition of Dysarthric Speech

Rudzicz, Frank 31 August 2011 (has links)
Millions of individuals have acquired or have been born with neuro-motor conditions that limit the control of their muscles, including those that manipulate the articulators of the vocal tract. These conditions, collectively called dysarthria, result in speech that is very difficult to understand, despite being generally syntactically and semantically correct. This difficulty is not limited to human listeners, but also adversely affects the performance of traditional automatic speech recognition (ASR) systems, which in some cases can be completely unusable by the affected individual. This dissertation describes research into improving ASR for speakers with dysarthria by means of incorporated knowledge of their speech production. The document first introduces theoretical aspects of dysarthria and of speech production and outlines related work in these combined areas within ASR. It then describes the acquisition and analysis of the TORGO database of dysarthric articulatory motion and demonstrates several consistent behaviours among speakers in this database, including predictable pronunciation errors, for example. Articulatory data are then used to train augmented ASR systems that model the statistical relationships between vocal tract configurations and their acoustic consequences. I show that dynamic Bayesian networks augmented with instantaneous theoretical or empirical articulatory variables outperform even discriminative alternatives. This leads to work that incorporates a more rigid theory of speech production, i.e., task-dynamics, that models the high-level and long-term aspects of speech production. For this task, I devised an algorithm for estimating articulatory positions given only acoustics that significantly outperforms the state-of-the-art. Finally, I present ongoing work into the transformation and re-synthesis of dysarthric speech in order to make it more intelligible to human listeners. This research represents definitive progress towards the accommodation of dysarthric speech within modern speech recognition systems. However, there is much more research that remains to be undertaken and I conclude with some thoughts as to which paths we might now take.
3

Visualising articulation : real-time ultrasound visual biofeedback and visual articulatory models and their use in treating speech sound disorders associated with submucous cleft palate

Roxburgh, Zoe January 2018 (has links)
Background: Ultrasound Tongue Imaging (UTI) is growing increasingly popular for assessing and treating Speech Sound Disorders (SSDs) and has more recently been used to qualitatively investigate compensatory articulations in speakers with cleft palate (CP). However, its therapeutic application for speakers with CP remains to be tested. A different set of developments, Visual Articulatory Models (VAMs), provide an offline dynamic model with context for lingual patterns. However, unlike UTI, they do not provide real-time biofeedback. Commercially available VAMs, such as Speech Trainer 3D, are available on iDevices, yet their clinical application remains to be tested. Aims: This thesis aims to test the diagnostic use of ultrasound, and investigate the effectiveness of both UTI and VAMs for the treatment of SSDs associated with submucous cleft palate (SMCP). Method: Using a single-subject multiple baseline design, two males with repaired SMCP, Andrew (aged 9;2) and Craig (aged 6;2), received six assessment sessions and two blocks of therapy, following a motor-based therapy approach, using VAMs and UTI. Three methods were used to measure therapy outcomes. Firstly, percent target consonant correct scores, derived from phonetic transcriptions provide outcomes comparable to those used in typical practice. Secondly, a multiplephonetically trained listener perceptual evaluation, using a two-alternative multiple forced choice design, to measure listener agreement provides a more objective measure. Thirdly, articulatory analysis, using qualitative and quantitative measures provides an additional perspective able to reveal covert errors. Results and Conclusions: There was overall improvement in the speech for both speakers, with a greater rate of change in therapy block one (VAMs) and listener agreement in the perceptual evaluation. Articulatory analysis supplemented phonetic transcriptions and detected covert articulations and covert contrast as well as supporting the improvements in auditory outcome scores. Both VAMs and UTI show promise as a clinical tool for the treatment of SSDs associated with CP.

Page generated in 0.0824 seconds