Global ETD Search

311	Characterizing Dysarthric Speech with Transfer Learning January 2020 (has links) abstract: Speech is known to serve as an early indicator of neurological decline, particularly in motor diseases. There is significant interest in developing automated, objective signal analytics that detect clinically-relevant changes and in evaluating these algorithms against the existing gold-standard: perceptual evaluation by trained speech and language pathologists. Hypernasality, the result of poor control of the velopharyngeal flap---the soft palate regulating airflow between the oral and nasal cavities---is one such speech symptom of interest, as precise velopharyngeal control is difficult to achieve under neuromuscular disorders. However, a host of co-modulating variables give hypernasal speech a complex and highly variable acoustic signature, making it difficult for skilled clinicians to assess and for automated systems to evaluate. Previous work in rating hypernasality from speech relies on either engineered features based on statistical signal processing or machine learning models trained end-to-end on clinical ratings of disordered speech examples. Engineered features often fail to capture the complex acoustic patterns associated with hypernasality, while end-to-end methods tend to overfit to the small datasets on which they are trained. In this thesis, I present a set of acoustic features, models, and strategies for characterizing hypernasality in dysarthric speech that split the difference between these two approaches, with the aim of capturing the complex perceptual character of hypernasality without overfitting to the small datasets available. The features are based on acoustic models trained on a large corpus of healthy speech, integrating expert knowledge to capture known perceptual characteristics of hypernasal speech. They are then used in relatively simple linear models to predict clinician hypernasality scores. These simple models are robust, generalizing across diseases and outperforming comprehensive set of baselines in accuracy and correlation. This novel approach represents a new state-of-the-art in objective hypernasality assessment. / Dissertation/Thesis / Masters Thesis Electrical Engineering 2020 Computer engineering deep learning dysarthria feature models hypernasality neurological disease speech processing
312	Speech Recognition Using a Synthesized Codebook Smith, Lloyd A. (Lloyd Allen) 08 1900 (has links) Speech sounds generated by a simple waveform synthesizer were used to create a vector quantization codebook for use in speech recognition. Recognition was tested over the TI-20 isolated word data base using a conventional DTW matching algorithm. Input speech was band limited to 300 - 3300 Hz, then passed through the Scott Instruments Corp. Coretechs process, implemented on a VET3 speech terminal, to create the speech representation for matching. Synthesized sounds were processed in software by a VET3 signal processing emulation program. Emulation and recognition were performed on a DEC VAX 11/750. The experiments were organized in 2 series. A preliminary experiment, using no vector quantization, provided a baseline for comparison. The original codebook contained 109 vectors, all derived from 2 formant synthesized sounds. This codebook was decimated through the course of the first series of experiments, based on the number of times each vector was used in quantizing the training data for the previous experiment, in order to determine the smallest subset of vectors suitable for coding the speech data base. The second series of experiments altered several test conditions in order to evaluate the applicability of the minimal synthesized codebook to conventional codebook training. The baseline recognition rate was 97%. The recognition rate for synthesized codebooks was approximately 92% for sizes ranging from 109 to 16 vectors. Accuracy for smaller codebooks was slightly less than 90%. Error analysis showed that the primary loss in dropping below 16 vectors was in coding of voiced sounds with high frequency second formants. The 16 vector synthesized codebook was chosen as the seed for the second series of experiments. After one training iteration, and using a normalized distortion score, trained codebooks performed with an accuracy of 95.1%. When codebooks were trained and tested on different sets of speakers, accuracy was 94.9%, indicating that very little speaker dependence was introduced by the training. codebooks speech recognition vector quantization waveform synthesizer Automatic speech recognition. Speech processing systems.
313	Odhad formantových kmitočtů pomocí strojového učení / Estimation of formant frequencies using machine learning Káčerová, Erika January 2019 (has links) This Master's thesis deals with the issue of formant extraction. A system of scripts in Matlab interface is created to generate values of the first three formant frequencies from speech recordings with the use of Praat and Snack(WaveSurfer). Mel Frequency Cepstral Coefficients and Linear Predictive Coefficients are extracted from the audio files in order to be added to the database. This database is then used to train a neural network. Finally, the designed neural network is tested.
314	Modul pro výuku výslovnosti cizích jazyků / Module for Pronunciation Training and Foreign Language Learning Kudláč, Vladan January 2021 (has links) Cílem této práce je vylepšit implementaci modulu pro mobilní aplikace pro výuku výslovnosti, najít místa vhodná pro optimalizaci a provést optimalizaci s cílem zvýšit přesnost, snížit čas zpracování a snížit paměťovou náročnost zpracování.
315	Webový prohlížeč audio/video záznamů přednášek: převod prohlížeče na MySQL databázi / Web Based Audio/Video Lecture Browser: Porting of the Browser to MySQL Database Janovič, Jakub January 2010 (has links) This project deals with a web-based lecture browser, whose goal is to simplify the gaining of knowledge with the use of multimedia. It presents an existing lecture browser that was created for a diploma thesis at FIT VUT Brno. Demonstrated are the technologies that are used and which will be used to migrate the browser to a MySQL database and to develop a transcription module for speeches. The reader will be acquainted with an analysis and model of the new application. Furthermore, implementation methods for development and subsequent testing are discussed. At the end of the project is a conclusion about the future development of web-based lecture browsers.
316	Phoneme-based Video Indexing Using Phonetic Disparity Search Barth, Carlos Leon 01 January 2010 (has links) This dissertation presents and evaluates a method to the video indexing problem by investigating a categorization method that transcribes audio content through Automatic Speech Recognition (ASR) combined with Dynamic Contextualization (DC), Phonetic Disparity Search (PDS) and Metaphone indexation. The suggested approach applies genome pattern matching algorithms with computational summarization to build a database infrastructure that provides an indexed summary of the original audio content. PDS complements the contextual phoneme indexing approach by optimizing topic seek performance and accuracy in large video content structures. A prototype was established to translate news broadcast video into text and phonemes automatically by using ASR utterance conversions. Each phonetic utterance extraction was then categorized, converted to Metaphones, and stored in a repository with contextual topical information attached and indexed for posterior search analysis. Following the original design strategy, a custom parallel interface was built to measure the capabilities of dissimilar phonetic queries and provide an interface for result analysis. The postulated solution provides evidence of a superior topic matching when compared to traditional word and phoneme search methods. Experimental results demonstrate that PDS can be 3.7% better than the same phoneme query, Metaphone search proved to be 154.6% better than the same phoneme seek and 68.1 % better than the equivalent word search. Automatic speech recognition Phonemics Speech processing systems Video recordings Electrical and Computer Engineering Electrical and Electronics Engineering
317	Electrophysiological indices of language processing in infants at risk for ASD Seery, Anne 12 March 2016 (has links) Behavioral symptoms of autism spectrum disorder (ASD) begin to emerge around 12 months of age and are preceded by subtle differences in how infants process and interact with the world (Elsabbagh & Johnson, 2010). Similar atypical behavioral patterns and markers of brain organization (`endophenotypes') are present in infants at risk for ASD (HRA) due to their family history, regardless of whether they ultimately develop the disorder. Possible endophenotypes of ASD were investigated through four studies that examined event-related potentials (ERPs) to speech and language in HRA and low-risk control (LRC) infants as part of a larger, longitudinal project. Chapter 2 examined ERPs to language-specific phonemes at 6, 9, and 12 months (n=59 at 6mo, 77 at 9mo, and 70 at 12mo) and found that HRA infants were not delayed in phonemic perceptual narrowing yet exhibited atypical hemispheric lateralization of ERPs at 9 and 12 months. Chapter 3 explored these findings further in a sample with known developmental outcome (n=60 at 6mo, 75 at 9mo, and 72 at 12mo) in order to understand how these ERPs differ between infants who ultimately develop ASD and infants who do not. Chapter 4 examined responses to repeated speech stimuli at 9 months (n=95). HRA infants exhibited atypically large ERPs to repeated speech, and this pattern was associated with better later language ability. Finally, Chapter 5 examined ERPs to words at 18 and 24 months (n=41 at 18mo, 52 at 24mo) and found evidence for atypical topography of responses to known versus unknown words, particularly at 18 months. These findings provide evidence that in HRA infants, even those who do not develop ASD, neural processing of linguistic stimuli is altered during infancy and toddlerhood. The results from Chapter 4 suggest that at least some of the differences seen in HRA infants who do not develop ASD may reflect beneficial, rather than disordered, processing. Overall, these results contribute to growing evidence that familial risk for ASD is associated with atypical processing of speech and language during infancy. Future work should continue to investigate more closely the implications of atypical neural processing for infants' later development. Psychology Autism spectrum disorder Event related-potentials Infancy Language acquisition Speech processing
318	Enabling Structured Navigation of Longform Spoken Dialog with Automatic Summarization Li, Daniel January 2022 (has links) Longform spoken dialog is a rich source of information that is present in all facets of everyday life, taking the form of podcasts, debates, and interviews; these mediums contain important topics ranging from healthcare and diversity to current events, economics and politics. Individuals need to digest informative content to know how to vote, decide how to stay safe from COVID-19, and how to increase diversity in the workplace. Unfortunately compared to text, spoken dialog can be challenging to consume as it is slower than reading and difficult to skim or navigate. Although an individual may be interested in a given topic, they may be unwilling to commit the required time necessary to consume long form auditory media given the uncertainty as to whether such content will live up to their expectations. Clearly, there exists a need to provide access to the information spoken dialog provides in a manner through which individuals can quickly and intuitively access areas of interest without investing large amounts of time. From Human Computer Interaction, we apply the idea of information foraging, which theorizes how people browse and navigate to satisfy an information need, to the longform spoken dialog domain. Information foraging states that people do not browse linearly. Rather people “forage” for information similar to how animals sniff around for food, scanning from area to area, constantly deciding whether to keep investigating their current area or to move on to greener pastures. This is an instance of the classic breadth vs. depth dilemma. People rely on perceived structure and information cues to make these decisions. Unfortunately speech, either spoken or transcribed, is unstructured and lacks information cues, making it difficult for users to browse and navigate. We create a longform spoken dialog browsing system that utilizes automatic summarization and speech modeling to structure longform dialog to present information in a manner that is both intuitive and flexible towards different user browsing needs. Leveraging summarization models to automatically and hierarchically structure spoken dialog, the system is able to distill information into increasingly salient and abstract summaries, allowing for a tiered representation that, if interested, users can progressively explore. Additionally, we address spoken dialog’s own set of technical challenges to speech modeling that are not present in written text, such as disfluencies, improper punctuation, lack of annotated speech data, and inherent lack of structure. We create a longform spoken dialog browsing system that utilizes automatic summarization and speech modeling to structure longform dialog to present information in a manner that is both intuitive and flexible towards different user browsing needs. Leveraging summarization models to automatically and hierarchically structure spoken dialog, the system is able to distill information into increasingly salient and abstract summaries, allowing for a tiered representation that, if interested, users can progressively explore. Additionally, we address spoken dialog’s own set of technical challenges to speech modeling that are not present in written text, such as disfluencies, improper punctuation, lack of annotated speech data, and inherent lack of structure. Since summarization is a lossy compression of information, the system provides users with information cues to signal how much additional information is contained on a topic. This thesis makes the following contributions: 1. We applied the HCI concept of information foraging to longform speech, enabling people to browse and navigate information in podcasts, interviews, panels, and meetings. 2. We created a system that structures longform dialog into hierarchical summaries which help users to 1) skim (browse) audio and 2) navigate and drill down into interesting sections to read full details. 3. We created a human annotated hierarchical dataset to quantitatively evaluate the effectiveness of our system’s hierarchical text generation performance. 4. Lastly, we developed a suite of dialog oriented processing optimizations to improve the user experience of summaries: enhanced readability and fluency of short summaries through better topic chunking and pronoun imputation, and reliable indication of semantic coverage within short summaries to help direct navigation towards interesting information. We discuss future research in extending the browsing and navigating system to more challenging domains such as lectures, which contain many external references, or workplace conversations, which contain uncontextualized background information and are far less structured than podcasts and interviews. Computer science Dialogue Podcasts Information technology
319	Self Regulatory Depletion Effects On Speed Within A Complex Speech Processing Task Reif, Angela 05 August 2014 (has links) No description available. Speech Therapy Cognitive Psychology Psychology depletion speech processing processing speed self-regulation executive function
320	Perceptual learning of systemic cross-category vowel variation Weatherholtz, Kodi 28 May 2015 (has links) No description available. Linguistics

Search results