Global ETD Search

1	Music expert-novice differences in speech perception Vassallo, Juan Sebastian 22 August 2019 (has links) It has been demonstrated that early, formal and extensive musical training induces changes both at the structural and functional levels in the brain. Previous evidence suggests that musicians are particularly skilled in auditory analysis tasks. In this study, I aimed to find evidence that musical training affects the perception of acoustic cues in audiovisual speech processing for Native-English speakers. Using the McGurk paradigm –an experimental procedure based on the perceptual illusion that occurs when an auditory speech message is paired to incongruent visual facial gestures, participants were required to identify the auditory component from an audiovisual speech presentation in four conditions: (1) Congruent auditory and visual modalities, (2) incongruent, (3) auditory only, and (4) visual only. Our data showed no significant differences in accuracy between groups differentiated by musical training. These findings have significant theoretical implications suggesting that auditory cues for speech and music are processed by separable cognitive domains and that musical training might not have a positive effect in speech perception. / Graduate / 2020-08-12 audiovisual speech musical training musicians McGurk effect
2	Gaze Strategies and Audiovisual Speech Enhancement Yi, Astrid 31 December 2010 (has links) Quantitative relationships were established between speech intelligibility and gaze patterns when subjects listened to sentences spoken by a single talker at different auditory SNRs while viewing one or more talkers. When the auditory SNR was reduced and subjects moved their eyes freely, the main gaze strategy involved looking closer to the mouth. The natural tendency to move closer to the mouth was found to be consistent with a gaze strategy that helps subjects improve their speech intelligibility in environments that include multiple talkers. With a single talker and a fixed point of gaze, subjects' speech intelligibility was found to be optimal for fixations that were distributed within 10 degrees of the center of the mouth. Lower performance was observed at larger eccentricities, and this decrease in performance was investigated by mapping the reduced acuity in the peripheral region to various levels of spatial degradation. audiovisual speech enhancement gaze strategies speech intelligibility 0541
3	Gaze Strategies and Audiovisual Speech Enhancement Yi, Astrid 31 December 2010 (has links) Quantitative relationships were established between speech intelligibility and gaze patterns when subjects listened to sentences spoken by a single talker at different auditory SNRs while viewing one or more talkers. When the auditory SNR was reduced and subjects moved their eyes freely, the main gaze strategy involved looking closer to the mouth. The natural tendency to move closer to the mouth was found to be consistent with a gaze strategy that helps subjects improve their speech intelligibility in environments that include multiple talkers. With a single talker and a fixed point of gaze, subjects' speech intelligibility was found to be optimal for fixations that were distributed within 10 degrees of the center of the mouth. Lower performance was observed at larger eccentricities, and this decrease in performance was investigated by mapping the reduced acuity in the peripheral region to various levels of spatial degradation. audiovisual speech enhancement gaze strategies speech intelligibility 0541
4	Perceptual plasticity in adverse listening conditions : factors affecting adaptation to accented and noise-vocoded speech Banks, Briony January 2016 (has links) Adverse listening conditions can be a hindrance to communication, but humans are remarkably adept at overcoming them. Research has begun to uncover the cognitive and behavioural mechanisms behind this perceptual plasticity, but we still do not fully understand the reasons for variability in individual responses. The research reported in this thesis addressed several factors which would further this understanding. Study 1 examined the role of cognitive ability in recognition of, and perceptual adaptation to, accented speech. A measure of executive function predicted greater and more rapid perceptual adaptation. Vocabulary knowledge predicted overall recognition of the accented speech, and mediated the relationship between working memory and recognition accuracy. Study 2 compared recognition of, and perceptual adaptation to, accented speech with and without audiovisual cues. The presence of audiovisual cues improved recognition of the accented speech in noise, but not perceptual adaptation. Study 3 investigated when perceivers make use of visual speech cues during recognition of, and perceptual adaptation to, audiovisual noise-vocoded speech. Listeners’ eye gaze was analysed over time and related to their performance. The percentage and length of fixations on the speaker’s mouth increased during recognition of individual sentences, while the length of fixations on the mouth decreased as perceivers adapted to the noise-vocoded speech over the course of the experiment. Longer fixations on the speaker’s mouth were related to better speech recognition. Results demonstrate that perceptual plasticity of unfamiliar speech is driven by cognitive processes, but can also be modified by the modality of speech (audiovisual or audio-only). Behavioural responses, such as eye gaze, are also related to our ability to respond to adverse conditions. Speech recognition and perceptual adaptation were differentially related to the factors in each study and therefore likely reflect different processes; these measures should therefore both be considered in studies investigating listeners’ response to adverse conditions. Overall, the research adds to our understanding of the mechanisms and behaviours involved in perceptual plasticity in adverse listening conditions. 401
5	Neural Correlates of Unimodal and Multimodal Speech Perception in Cochlear Implant Users and Normal-Hearing Listeners Shatzer, Hannah Elizabeth January 2020 (has links) No description available. Psychology Cognitive Psychology speech perception audiovisual speech cochlear implants electroencephalography
6	Visual and Temporal Influences on Multimodal Speech Integration Shatzer, Hannah Elizabeth 03 September 2015 (has links) No description available. Psychology audiovisual speech integration visual informativeness duration onset asynchrony
7	Towards an intelligent fuzzy based multimodal two stage speech enhancement system Abel, Andrew January 2013 (has links) This thesis presents a novel two stage multimodal speech enhancement system, making use of both visual and audio information to filter speech, and explores the extension of this system with the use of fuzzy logic to demonstrate proof of concept for an envisaged autonomous, adaptive, and context aware multimodal system. The design of the proposed cognitively inspired framework is scalable, meaning that it is possible for the techniques used in individual parts of the system to be upgraded and there is scope for the initial framework presented here to be expanded. In the proposed system, the concept of single modality two stage filtering is extended to include the visual modality. Noisy speech information received by a microphone array is first pre-processed by visually derived Wiener filtering employing the novel use of the Gaussian Mixture Regression (GMR) technique, making use of associated visual speech information, extracted using a state of the art Semi Adaptive Appearance Models (SAAM) based lip tracking approach. This pre-processed speech is then enhanced further by audio only beamforming using a state of the art Transfer Function Generalised Sidelobe Canceller (TFGSC) approach. This results in a system which is designed to function in challenging noisy speech environments (using speech sentences with different speakers from the GRID corpus and a range of noise recordings), and both objective and subjective test results (employing the widely used Perceptual Evaluation of Speech Quality (PESQ) measure, a composite objective measure, and subjective listening tests), showing that this initial system is capable of delivering very encouraging results with regard to filtering speech mixtures in difficult reverberant speech environments. Some limitations of this initial framework are identified, and the extension of this multimodal system is explored, with the development of a fuzzy logic based framework and a proof of concept demonstration implemented. Results show that this proposed autonomous,adaptive, and context aware multimodal framework is capable of delivering very positive results in difficult noisy speech environments, with cognitively inspired use of audio and visual information, depending on environmental conditions. Finally some concluding remarks are made along with proposals for future work. 500
8	Cognitive resources in audiovisual speech perception BUCHAN, JULIE N 11 October 2011 (has links) Most events that we encounter in everyday life provide our different senses with correlated information, and audiovisual speech perception is a familiar instance of multisensory integration. Several approaches will be used to further examine the role of cognitive factors on audiovisual speech perception. The main focuses of this thesis will be to examine the influences of cognitive load and selective attention on audiovisual speech perception, as well as the integration of auditory and visual information in talking distractor faces. The influence of cognitive factors on the temporal integration of auditory and visual speech, and gaze behaviour during audiovisual speech will also be addressed. The overall results of the experiments presented here suggest that the integration of auditory and visual speech information is quite robust to various attempts to modulate the integration. Adding a cognitive load task shows minimal disruption of the integration of auditory and visual speech information. Changing attentional instructions to get subjects to selectively attend to either the auditory or visual speech information also has a rather modest influence on the observed integration of auditory and visual speech information. Generally, the integration of temporally offset auditory and visual information seems rather insensitive to cognitive load or selective attentional manipulations. The processing of visual information from distractor faces seems to be limited. The language of the visually articulating distractors doesn't appear to provide information that is helpful for matching together the auditory and visual speech streams. Audiovisual speech distractors are not really any more distracting than auditory distractor speech paired with a still image, suggesting a limited processing or integration of the visual and auditory distractor information. The gaze behaviour during audiovisual speech perception appears to be relatively unaffected by an increase in cognitive load, but is somewhat influenced by attentional instructions to selectively attend to the auditory and visual information. Additionally, both the congruency of the consonant, and the temporal offset of the auditory and visual stimuli have small but rather robust influences on gaze. / Thesis (Ph.D, Psychology) -- Queen's University, 2011-09-30 23:31:07.754 attention cognitive load audiovisual distractors audiovisual speech perception temporal integration perception multisensory integration selective attention
9	Time is of the essence in speech perception! : Get it fast, or think about it / Lyssna nu! : Hör rätt direkt, eller klura på det! Moradi, Shahram January 2014 (has links) The present thesis examined the extent to which background noise influences the isolation point (IP, the shortest time from the onset of speech stimulus required for correct identification of that speech stimulus) and accuracy in identification of different types of speech stimuli (consonants, words, and final words in high-predictable [HP] and low-predictable [LP] sentences). These speech stimuli were presented in different modalities of presentation (auditory, visual, and audiovisual) to young normal-hearing listeners (Papers 1, 2, and 5). In addition, the present thesis studied under what conditions cognitive resources were explicitly demanded in identification of different types of speech stimuli (Papers 1 and 2). Further, elderly hearing-aid (EHA) users and elderly normal-hearing (ENH) listeners were compared with regard to the IPs, accuracy, and under what conditions explicit cognitive resources were demanded in identification of auditory speech stimuli in silence (Paper 3). The results showed that background noise resulted in later IPs and reduced the accuracy for the identification of different types of speech stimuli in both modalities of speech presentation. Explicit cognitive resources were demanded in identification of speech stimuli in the auditory-only modality, under the noisy condition, and in the absence of a prior semantic context. In addition, audiovisual presentation of speech stimuli resulted in earlier IPs and more accurate identification of speech stimuli than auditory presentation. Furthermore, a pre-exposure to audiovisual speech stimuli resulted in better auditory speech-in-noise identification than an exposure to auditory-only speech stimuli (Papers 2 and 4). When comparing EHA users and ENH individuals, the EHA users showed inferior performance in the identification of consonants, words, and final words in LP sentences (in terms of IP). In terms of accuracy, the EHA users demonstrated inferior performance only in the identification of consonants and words. Only the identification of consonants and words demanded explicit cognitive resources in the EHA users. Theoretical predictions and clinical implications were discussed. / I denna avhandling undersöktes hur mycket bakgrundsbuller inverkar på isolationspunkten (IP, den tidigaste tidpunkt när ett talat stimulus kan identifieras korrekt) och exakthet i identifikation av olika typer av talade stimuli (konsonanter, ord, och ord i final position i högt predicerbara [HP] respektive lågt predicerbara [LP] meningar). Dessa talade stimuli presenterades i olika modaliteteter (auditivt, visuellt, och audiovisuellt) för unga normalhörande deltagare (Artikel 1, 2 och 5). Dessutom jämfördes under vilka betingelser explicita kognitiva resurser krävdes för identifikation av olika typer av talade stimuli (Artikel 1 och 2). Vidare jämfördes äldre hörapparatsanvändare (EHA) och äldre normalhörande (ENH) personer med avseende på IP, exakthet i identifikation, och under vilka betingelser explicita kognitiva resurser krävdes för auditiv identifikation i tystnad (d.v.s. utan bakgrundsbuller) (Artikel 3). Resultaten visade att bakgrundsbuller gav senare IP och sänkte exaktheten för identifikation av olika typer av talade stimuli och i båda modaliteterna för presentation. Explicita kognitiva resurser krävdes vid identifikation av talade stimuli vid rent auditiv presentation med bakgrundsbuller, och när ingen semantisk förhandsinformation presenterades. Dessutom resulterade audiovisuell presentation i tidigare IP och mer exakt identifikation av talade stimuli, jämfört med rent auditiv presentation. Ett ytterligare resultat var att förexponering av audiovisuella talade stimuli resulterade i bättre identifikation av tal i bakgrundsbrus, jämfört med förexponering av enbart auditiva talade stimuli (Artikel 2 och 4). Vid jämförelse av EHA-användare och ENH-personer, hade EHA-användare senare IP i identifikation av konsonanter, ord, och ord i final position i LP-meningar. Dessutom hade EHA-användare mindre exakt identifikation av konsonanter och ord. Endast identifikation av konsonanter och ord krävde explicita kognitiva resurser hos EHA-användare. Teoretiska prediktioner och kliniska implikationer diskuterades. Noise auditory speech perception audiovisual speech perception hearing aids Buller hörsel auditiv talperception audiovisuell talperception hörhjälpmedel
10	Neural indices and looking behaviors of audiovisual speech processing in infancy and early childhood Finch, Kayla 12 November 2019 (has links) Language is a multimodal process with visual and auditory cues playing important roles in understanding speech. A well-controlled paradigm with audiovisually matched and mismatched syllables is often used to capture audiovisual (AV) speech processing. The ability to detect and integrate mismatching cues shows large individual variability across development and is linked to later language in typical development (TD) and social abilities in autism spectrum disorder (ASD). However, no study has used a multimethod approach to better understand AV speech processing in early development. The studies’ aims were to examine behavioral performance, gaze patterns, and neural indices of AV speech in: 1) TD preschoolers (N=60; females=35) and 2) infants at risk for developing ASD (high-risk, HR; N=37; females=10) and TD controls (low-risk, LR; N=42; females=21). In Study 1, I investigated preschoolers’ gaze patterns and behavioral performance when presented with matched and mismatched AV speech and visual-only (lipreading) speech. As hypothesized, lipreading abilities were associated with children’s ability to integrate mismatching AV cues, and children looked towards the mouth when visual cues were helpful, specifically in lipreading conditions. Unexpectedly, looking time towards the mouth was not associated with the children’s ability to integrate mismatching AV cues. Study 2 examined how visual cues of AV speech modulated auditory event-related potentials (ERPs), and associations between ERPs and preschoolers’ behavioral performance during an AV speech task. As hypothesized, the auditory ERPs were attenuated during AV speech compared to auditory-only speech. Additionally, individual differences in their neural processing of auditory and visual cues predicted which cue the child attended to in mismatched AV speech. In Study 3, I investigated ERPs of AV speech in LR and HR 12-month-olds and their association with language abilities at 18-months. Unexpectedly, I found no group differences: all infants were able to detect mismatched AV speech as measured through a more negative ERP response. As hypothesized, more mature neural processing of AV speech integration, measured as a more positive ERP response to fusible AV cues, predicted later language across all infants. These results highlight the importance of using multimethod approaches to understand variability in AV speech processing at two developmental stages. / 2021-11-12T00:00:00Z Developmental psychology Audiovisual speech Autism spectrum disorder ERPs Eye-tracking Language McGurk effect

Search results