Spelling suggestions: "subject:"[een] SPEECH PROCESSING"" "subject:"[enn] SPEECH PROCESSING""
201 |
Electrophysiological indices of language processing in infants at risk for ASDSeery, Anne 12 March 2016 (has links)
Behavioral symptoms of autism spectrum disorder (ASD) begin to emerge around 12 months of age and are preceded by subtle differences in how infants process and interact with the world (Elsabbagh & Johnson, 2010). Similar atypical behavioral patterns and markers of brain organization (`endophenotypes') are present in infants at risk for ASD (HRA) due to their family history, regardless of whether they ultimately develop the disorder. Possible endophenotypes of ASD were investigated through four studies that examined event-related potentials (ERPs) to speech and language in HRA and low-risk control (LRC) infants as part of a larger, longitudinal project.
Chapter 2 examined ERPs to language-specific phonemes at 6, 9, and 12 months (n=59 at 6mo, 77 at 9mo, and 70 at 12mo) and found that HRA infants were not delayed in phonemic perceptual narrowing yet exhibited atypical hemispheric lateralization of ERPs at 9 and 12 months. Chapter 3 explored these findings further in a sample with known developmental outcome (n=60 at 6mo, 75 at 9mo, and 72 at 12mo) in order to understand how these ERPs differ between infants who ultimately develop ASD and infants who do not. Chapter 4 examined responses to repeated speech stimuli at 9 months (n=95). HRA infants exhibited atypically large ERPs to repeated speech, and this pattern was associated with better later language ability. Finally, Chapter 5 examined ERPs to words at 18 and 24 months (n=41 at 18mo, 52 at 24mo) and found evidence for atypical topography of responses to known versus unknown words, particularly at 18 months.
These findings provide evidence that in HRA infants, even those who do not develop ASD, neural processing of linguistic stimuli is altered during infancy and toddlerhood. The results from Chapter 4 suggest that at least some of the differences seen in HRA infants who do not develop ASD may reflect beneficial, rather than disordered, processing. Overall, these results contribute to growing evidence that familial risk for ASD is associated with atypical processing of speech and language during infancy. Future work should continue to investigate more closely the implications of atypical neural processing for infants' later development.
|
202 |
Enabling Structured Navigation of Longform Spoken Dialog with Automatic SummarizationLi, Daniel January 2022 (has links)
Longform spoken dialog is a rich source of information that is present in all facets of everyday life, taking the form of podcasts, debates, and interviews; these mediums contain important topics ranging from healthcare and diversity to current events, economics and politics. Individuals need to digest informative content to know how to vote, decide how to stay safe from COVID-19, and how to increase diversity in the workplace.
Unfortunately compared to text, spoken dialog can be challenging to consume as it is slower than reading and difficult to skim or navigate. Although an individual may be interested in a given topic, they may be unwilling to commit the required time necessary to consume long form auditory media given the uncertainty as to whether such content will live up to their expectations. Clearly, there exists a need to provide access to the information spoken dialog provides in a manner through which individuals can quickly and intuitively access areas of interest without investing large amounts of time.
From Human Computer Interaction, we apply the idea of information foraging, which theorizes how people browse and navigate to satisfy an information need, to the longform spoken dialog domain. Information foraging states that people do not browse linearly. Rather people “forage” for information similar to how animals sniff around for food, scanning from area to area, constantly deciding whether to keep investigating their current area or to move on to greener pastures. This is an instance of the classic breadth vs. depth dilemma. People rely on perceived structure and information cues to make these decisions. Unfortunately speech, either spoken or transcribed, is unstructured and lacks information cues, making it difficult for users to browse and navigate.
We create a longform spoken dialog browsing system that utilizes automatic summarization and speech modeling to structure longform dialog to present information in a manner that is both intuitive and flexible towards different user browsing needs. Leveraging summarization models to automatically and hierarchically structure spoken dialog, the system is able to distill information into increasingly salient and abstract summaries, allowing for a tiered representation that, if interested, users can progressively explore. Additionally, we address spoken dialog’s own set of technical challenges to speech modeling that are not present in written text, such as disfluencies, improper punctuation, lack of annotated speech data, and inherent lack of structure.
We create a longform spoken dialog browsing system that utilizes automatic summarization and speech modeling to structure longform dialog to present information in a manner that is both intuitive and flexible towards different user browsing needs. Leveraging summarization models to automatically and hierarchically structure spoken dialog, the system is able to distill information into increasingly salient and abstract summaries, allowing for a tiered representation that, if interested, users can progressively explore. Additionally, we address spoken dialog’s own set of technical challenges to speech modeling that are not present in written text, such as disfluencies, improper punctuation, lack of annotated speech data, and inherent lack of structure. Since summarization is a lossy compression of information, the system provides users with information cues to signal how much additional information is contained on a topic.
This thesis makes the following contributions:
1. We applied the HCI concept of information foraging to longform speech, enabling people to browse and navigate information in podcasts, interviews, panels, and meetings.
2. We created a system that structures longform dialog into hierarchical summaries which help users to 1) skim (browse) audio and 2) navigate and drill down into interesting sections to read full details.
3. We created a human annotated hierarchical dataset to quantitatively evaluate the effectiveness of our system’s hierarchical text generation performance.
4. Lastly, we developed a suite of dialog oriented processing optimizations to improve the user experience of summaries: enhanced readability and fluency of short summaries through better topic chunking and pronoun imputation, and reliable indication of semantic coverage within short summaries to help direct navigation towards interesting information.
We discuss future research in extending the browsing and navigating system to more challenging domains such as lectures, which contain many external references, or workplace conversations, which contain uncontextualized background information and are far less structured than podcasts and interviews.
|
203 |
Self Regulatory Depletion Effects On Speed Within A Complex Speech Processing TaskReif, Angela 05 August 2014 (has links)
No description available.
|
204 |
Perceptual learning of systemic cross-category vowel variationWeatherholtz, Kodi 28 May 2015 (has links)
No description available.
|
205 |
Examining Pupillometric Measures of Cognitive Effort Associated with Speaker Variability During Spoken Word RecognitionDouds, Lillian R. 01 May 2017 (has links)
No description available.
|
206 |
Polar Spectrum CodingChapman, Daniel Harris 01 January 1988 (has links) (PDF)
Polar Spectrum Coding is a novel speech coding algorithm for narrowband voice communications. A polar Fourier transform of the signal is computed, and the magnitude and phase of the speech spectrum is encoded for transmission. The correlation between frames of speech signals is exploited to minimize the transmission rate required for intelligible speech. At the receiver, the encoded words are decoded and the spectrum reconstructed. An inverse Fourier transform is performed, and the result is the reconstructed speech waveform. Polar Spectrum Coding theory is explained. The sensitivities of various parameters on performance are explored, and performance in the presence of channel noise is measured. Directions for future research in the realm of Polar Spectrum Coding is suggested.
|
207 |
Investigating Speaker Features From Very Short Speech RecordsBerg, Brian LaRoy 11 September 2001 (has links)
A procedure is presented that is capable of extracting various speaker features, and is of particular value for analyzing records containing single words and shorter segments of speech.
By taking advantage of the fast convergence properties of adaptive filtering, the approach is capable of modeling the nonstationarities due to both the vocal tract and vocal cord dynamics.
Specifically, the procedure extracts the vocal tract estimate from within the closed glottis interval and uses it to obtain a time-domain glottal signal. This procedure is quite simple, requires minimal manual intervention (in cases of inadequate pitch detection), and is particularly unique because it derives both the vocal tract and glottal signal estimates directly from the time-varying filter coefficients rather than from the prediction error signal. Using this procedure, several glottal signals are derived from human and synthesized speech and are analyzed to demonstrate the glottal waveform modeling performance and kind of glottal characteristics obtained therewith. Finally, the procedure is evaluated using automatic speaker identity verification. / Ph. D.
|
208 |
Voice input technology: learning style and attitude toward its useFournier, Randolph S. 19 June 2006 (has links)
This study was designed to investigate whether learning style and attitudes toward voice input technology were related to performance in using the technology. Three null hypotheses were tested: (a) No differences exist in the performance in dictating a paragraph using voice input for individuals with different learning styles; (b) No differences exist in attitude toward voice input for individuals with different learning styles; and (c) No interaction exists for the performance scores for individuals with different learning styles and different attitudes toward voice input technology. The statistical procedure used to examine the hypotheses was analysis of variance.
Participants were 50 students preparing to become vocational teachers enrolled in vocational education courses at Virginia Tech. Procedures involved having the participants complete three stages. First, they completed the Gregorc Style Delineator (GSD) learning style instrument. Due to a lack of individuals of one learning style category, abstract sequential (AS), only three learning style categories were used in the study. Second, they completed a background information sheet. Third, they participated in the voice-input training and dictation phase. Each student completed a one-hour session that included training, practice using voice input, and dictating a paragraph. Participants also completed the Attitude Toward Voice Input Scale developed by the researcher. It includes 21 attitude statements, 11 positively worded and 10 negatively worded.
The first hypothesis was not rejected. A student's learning style does not relate to the performance of the student when dictating a paragraph using voice input technology. The second hypothesis was not rejected either. A student's attitude toward voice input technology was not related to learning style. The third hypothesis was also not rejected. A student's learning style, regardless of whether the student had a "high" or "low" attitude toward voice input, was not significantly related to performance in using voice input technology. However, the mean performance scores of individuals with concrete sequential (CS) learning styles with "high" and "low" attitudes did appear to be different. Those with "high" attitudes toward voice input had better performance scores than those with "low" attitudes toward the technology. / Ph. D.
|
209 |
A voice interface for VTLSMehta, Pranav January 1989 (has links)
The objective of this study was to develop a voice interface for the on-line catalog of VTLS. Three major components of the system, namely, voice recognition system, text-to-speech synthesizer, and screen review program, were identified. These components were selected after a comparative study of several commercially available systems. Once the components were selected they were integrated to form a complete voice recognition and synthesis system. Using this system, a voice interface was realized to suit the operations of VTLS. A telephone interface for the system was investigated and recommendations were made for future research. / Master of Science
|
210 |
Accurate speaker identification employing redundant waveform and model based speech signal representationsPremakanthan, Pravinkumar 01 October 2002 (has links)
No description available.
|
Page generated in 0.0341 seconds