151 |
Adaptation of Cantonese Hearing in Noise Test (CHINT) scoring methods for testing in cochlear implant patientsKeung, Kon-him., 姜幹謙. January 2010 (has links)
published_or_final_version / Speech and Hearing Sciences / Master / Master of Science in Audiology
|
152 |
Intelligibility of clear speech at normal rates for older adults with hearing lossShaw, Billie Jo 01 June 2006 (has links)
Clear speech refers to a speaking style that is more intelligible than typical, conversational speaking styles. It is usually produced at a slower rate compared to conversational speech. Clear speech has been shown to be more intelligible than conversational speech for a large variety of populations, including both hearing impaired (Schum, 1996; Picheny, Durlach, & Braida, 1985; and Payton, Uchanski, & Braida, 1994) and normal hearing individuals (e.g. Uchanski, Choi, Braida, Reed, & Durlach, 1996) under a variety of conditions, including those in which presentation level, speaker, and environment are varied. Although clear speech is typically slower than normally produced conversational speech, recent studies have shown that it can be produced at normal rates with training (Krause & Braida, 2002).
If clear speech at normal rates is shown to be as effective for individuals with hearing loss as clear speech at slow rates, it would have both clinical and research implications. The purpose of this study was to determine the effectiveness of clear speech at normal rates for older individuals with hearing loss. It examined the way in which intelligibility, measured as percent correct keyword scores on nonsense sentences, varied as a result of speaking mode (clear versus conversational speech) and speaking rate (slow versus normal) in six adults aged 55-75 years old with moderate, sloping, hearing loss. Each listener was presented with nonsense sentences in four speech conditions: clear speech at slow rates (clear/slow), clear speech at normal rates (clear/normal), conversational speech at slow rates (conv/slow), and conversational speech at normal rates (conv/normal) read by four different talkers. Sentences were presented monaurally in quiet to the listeners via headphones.
Results indicated that clear/slow speech was the most intelligible condition overall. Neither conv/slow nor clear/normal provided an intelligibility benefit relative to conv/normal speech on average, suggesting that for older adults with moderate, sloping hearing loss, the combination of using clear speech and a slower speaking rate is more beneficial to intelligibility than the additive effects of altering either speaking rate or speaking mode alone. It has been suggested previously (Krause, 2001) that audiological characteristics may contribute to the lack of clear/normal benefit for certain listeners with hearing loss. Although clear/normal speech was not beneficial on average to listeners in this study, there were cases in which the clear/normal speech of a particular talker provided a benefit to a particular listener.
Thus, severity and configuration of hearing loss alone cannot fully explain the degree to which listeners from hearing loss do (or do not) benefit from clear/normal speech. More studies are needed to investigate the benefits of clear/normal speech for different audiological configurations, including individuals with flat losses. In addition, the listening tasks should include more difficult conditions in order to compensate for potential ceiling effects.
|
153 |
Segmental errors, speech intelligibility and their relationship in Cantonese speaking hearing-impaired childrenKhouw, Edward., 許源豐. January 1994 (has links)
published_or_final_version / Speech and Hearing Sciences / Master / Master of Philosophy
|
154 |
Disfluency in Swedish human–human and human–machine travel booking dialoguesEklund, Robert January 2004 (has links)
This thesis studies disfluency in spontaneous Swedish speech, i.e., the occurrence of hesitation phenomena like eh, öh, truncated words, repetitions and repairs, mispronunciations, truncated words and so on. The thesis is divided into three parts: PART I provides the background, both concerning scientific, personal and industrial–academic aspects in the Tuning in quotes, and the Preamble and Introduction (chapter 1). PART II consists of one chapter only, chapter 2, which dives into the etiology of disfluency. Consequently it describes previous research on disfluencies, also including areas that are not the main focus of the present tome, like stuttering, psychotherapy, philosophy, neurology, discourse perspectives, speech production, application-driven perspectives, cognitive aspects, and so on. A discussion on terminology and definitions is also provided. The goal of this chapter is to provide as broad a picture as possible of the phenomenon of disfluency, and how all those different and varying perspectives are related to each other. PART III describes the linguistic data studied and analyzed in this thesis, with the following structure: Chapter 3 describes how the speech data were collected, and for what reason. Sum totals of the data and the post-processing method are also described. Chapter 4 describes how the data were transcribed, annotated and analyzed. The labeling method is described in detail, as is the method employed to do frequency counts. Chapter 5 presents the analysis and results for all different categories of disfluencies. Besides general frequency and distribution of the different types of disfluencies, both inter- and intra-corpus results are presented, as are co-occurrences of different types of disfluencies. Also, inter- and intra-speaker differences are discussed. Chapter 6 discusses the results, mainly in light of previous research. Reasons for the observed frequencies and distribution are proposed, as are their relation to language typology, as well as syntactic, morphological and phonetic reasons for the observed phenomena. Future work is also envisaged, both work that is possible on the present data set, work that is possible on the present data set given extended labeling and work that I think should be carried out, but where the present data set fails, in one way or another, to meet the requirements of such studies. Appendices 1–4 list the sum total of all data analyzed in this thesis (apart from Tok Pisin data). Appendix 5 provides an example of a full human–computer dialogue. / The electronic version of the printed dissertation is a corrected version where typos as well as phrases have been corrected. A list with the corrections is presented in the errata list above.
|
155 |
Blind dereverberation of speech from moving and stationary speakers using sequential Monte Carlo methodsEvers, Christine January 2010 (has links)
Speech signals radiated in confined spaces are subject to reverberation due to reflections of surrounding walls and obstacles. Reverberation leads to severe degradation of speech intelligibility and can be prohibitive for applications where speech is digitally recorded, such as audio conferencing or hearing aids. Dereverberation of speech is therefore an important field in speech enhancement. Driven by consumer demand, blind speech dereverberation has become a popular field in the research community and has led to many interesting approaches in the literature. However, most existing methods are dictated by their underlying models and hence suffer from assumptions that constrain the approaches to specific subproblems of blind speech dereverberation. For example, many approaches limit the dereverberation to voiced speech sounds, leading to poor results for unvoiced speech. Few approaches tackle single-sensor blind speech dereverberation, and only a very limited subset allows for dereverberation of speech from moving speakers. Therefore, the aim of this dissertation is the development of a flexible and extendible framework for blind speech dereverberation accommodating different speech sound types, single- or multiple sensor as well as stationary and moving speakers. Bayesian methods benefit from – rather than being dictated by – appropriate model choices. Therefore, the problem of blind speech dereverberation is considered from a Bayesian perspective in this thesis. A generic sequential Monte Carlo approach accommodating a multitude of models for the speech production mechanism and room transfer function is consequently derived. In this approach both the anechoic source signal and reverberant channel are estimated using their optimal estimators by means of Rao-Blackwellisation of the state-space of unknown variables. The remaining model parameters are estimated using sequential importance resampling. The proposed approach is implemented for two different speech production models for stationary speakers, demonstrating substantial reduction in reverberation for both unvoiced and voiced speech sounds. Furthermore, the channel model is extended to facilitate blind dereverberation of speech from moving speakers. Due to the structure of measurement model, single- as well as multi-microphone processing is facilitated, accommodating physically constrained scenarios where only a single sensor can be used as well as allowing for the exploitation of spatial diversity in scenarios where the physical size of microphone arrays is of no concern. This dissertation is concluded with a survey of possible directions for future research, including the use of switching Markov source models, joint target tracking and enhancement, as well as an extension to subband processing for improved computational efficiency.
|
156 |
The Impact of Breathiness on Speech Intelligibility in Pathological VoiceThompson, Louise Shirley January 2011 (has links)
Aim
The aim of this study was to determine how deterioration of voice quality, such as breathiness, may impact on the intelligibility of speech.
Method
Acoustic analysis was conducted on sustained vowel phonation (/i/ and /a/) and sentences produced by voice disordered speakers. Measures included: frequency and amplitude of the first two formants (F1, F2), singing power ratio (SPR), the amplitude difference between the first two harmonics (H1-H2), voice onset time (VOT), and energy ratio between consonant and vowel (CV energy ratio). A series of two-way (glottal closure by vowel) mixed design between and within-subjects Analysis of Variances conducted on these acoustic measures showed a significant glottal closure (complete and incomplete) or glottal closure by vowel interaction effect on the F2 frequency, H1-H2 amplitude difference, and singing power ratio. Based on findings in literature that reported a dominant first harmonic as a useful predictor of breathiness, the measure of H1-H2 amplitude difference was selected as a factor for investigation of the impact of voice quality on the perception of vowel intelligibility and clarity. Fixed-length vowel segments at five levels of H1-H2 amplitude difference were presented to 10 male and 10 female inexperienced listeners between the ages of 19 and 34 years.
Results
It was expected that the tokens with a dominant first harmonic, indicative of a more breathy voice, would be associated with a lower rate of correct vowel identification and of being perceived as “clearer”. Although no linear relationship between breathiness and intelligibility was revealed, results indicated the presence of thresholds of intelligibility for particular vowels whereby once a level of breathiness was reached intelligibility would decline.
Conclusion
The finding of a change of the perceptual ratings as a function of the H1-H2 amplitude difference, identified in previous studies as a measure of breathiness, revealed thresholds of intelligibility for particular vowels below which breathiness would be tolerated with little impact on intelligibility but beyond which intelligibility ratings suffered markedly.
|
157 |
Perceptual learning of dysarthric speechBorrie, Stephanie Anna January 2011 (has links)
Perceptual learning, when applied to speech, describes experience-evoked adjustments to the cognitive-perceptual processes required for recognising spoken language. It provides the theoretical basis for improved understanding of a speech signal that is initially difficult to perceive. Reduced intelligibility is a frequent and debilitating symptom of dysarthria, a speech disorder associated with neurological disease or injury. The current thesis investigated perceptual learning of dysarthric speech, by jointly considering intelligibility improvements and associated learning mechanisms for listeners familiarised with the neurologically degraded signal. Moderate hypokinetic dysarthria was employed as the test case in the three phases of this programme of research.
The initial research phase established strong empirical evidence of improved recognition of dysarthric speech following a familiarisation experience. Sixty normal hearing listeners were randomly assigned to one of three groups and familiarised with passage readings under the following conditions: (1) neurologically intact speech (control) (n = 20), dysarthric speech (passive familiarisation) (n = 20), and (3) dysarthric speech coupled with written information (explicit familiarisation) (n = 20). Subsequent phrase transcription analysis revealed that the intelligibility scores of both groups familiarised with dysarthric speech were significantly higher than those of the control group. Furthermore, performance gains were superior, in both size and longevity, when the familiarisation conditions were explicit. A condition discrepancy in segmentation strategies, in which attention towards syllabic stress contrast cues increased following explicit familiarisation but decreased following passive familiarisation, indicated that performance differences were more than simply magnitude of benefit. Thus, it was speculated that the learning that occurred with passive familiarisation may be qualitatively different to that which occurred with explicit familiarisation.
The second phase of the research programme followed up on the initial findings and examined whether the key variable behind the use of particular segmentation strategies was simply the presence or absence of written information during familiarisation. Forty normal hearing listeners were randomly assigned to one of two groups and were familiarised with experimental phrases under either passive (n = 20) or explicit (n = 20) learning conditions. Subsequent phrase transcription analysis revealed that regardless of condition, all listeners utilised syllabic stress contrast cues to segment speech following familiarisation with phrases that emphasised this prosodic perception cue. Furthermore, the study revealed that, in addition to familiarisation condition, intelligibility gains were dependent on the type of the familiarisation stimuli employed. Taken together, the first two research phases demonstrated that perceptual learning of dysarthric speech is influenced by the information afforded within the familiarisation procedure.
The final research phase examined the role of indexical information in perceptual learning of dysarthric speech. Forty normal hearing listeners were randomly assigned to one of two groups and were familiarised with dysarthric speech via a training task that emphasised either the linguistic (word identification) (n = 20) or indexical (speaker identification) (n = 20) properties of the signal. Intelligibility gains for listeners trained to identify indexical information paralleled those achieved by listeners trained to identify linguistic information. Similarly, underlying error patterns were also comparable between the two training groups. Thus, phase three revealed that both indexical and linguistic features of the dysarthric signal are learnable, and can be used to promote subsequent processing of dysarthric speech.
In summary, this thesis has demonstrated that listeners can learn to better understand neurologically degraded speech. Furthermore, it has offered insight into how the information afforded by the specific familiarisation procedure is differentially leveraged to improve perceptual performance during subsequent encounters with the dysarthric signal. Thus, this programme of research affords preliminary evidence towards the development of a theoretical framework that exploits perceptual learning for the treatment of dysarthria.
|
158 |
Avgörande faktorer för talnaturlighet hos personer med Parkinsons sjukdom : Korrelationsstudie mellan naiva lyssnares bedömning och akustisk analys / Crtitical Factors for Speech Naturalness in People with Parkinson's Disease. : A Correlational Study between Listener Judgement and Acoustic Analysis.Larsson, Elias, Isaksson, Fredrik January 2015 (has links)
Tal- och röstförändringar är vanligt förekommande hos personer med Parkinsons sjukdom. Dessa påverkar ofta talarens förståelighet men kan också ha en negativ inverkan på talets naturlighet. Forskning angående vilka faktorer som påverkar talets naturlighet är i dagsläget begränsad, varför föreliggande studie har genomförts. Syftet med studien var att undersöka huruvida den uppfattade talnaturligheten kunde härledas till några specifika tal- och röstparametrar. I föreliggande studie konstruerades ett testbatteri för att elicitera talmaterial från åtta personer med Parkinsons sjukdom. Forskningspersonernas röster spelades in och inspelningarna graderades sedan av 27 naiva lyssnare gällande förståelighet och talnaturlighet. Korrelationstester genomfördes slutligen för att hitta eventuella samband mellan lyssnarnas bedömning och olika akustiska parametrar. Resultatet visade att tal- och artikulationshastighet var den faktor med störst inverkan på lyssnargruppens bedömning av talnaturlighet, där de med långsammast hastighet bedömdes ha mest onaturligt tal. Vidare fanns starka indikationer på att grad av förståelighet korrelerade med bedömningen av talnaturlighet. I föreliggande studie tycktes inga övriga akustiska parametrar ha en statistiskt signifikant korrelation med lyssnargruppens bedömning av talnaturlighet. / Speech and voice changes are common in Parkinson’s disease. These changes can affect the speaker’s intelligibility but can also have a negative impact on the perceived naturalness of speech. The research available regarding the different factors that affect speech naturalness is scarce, which was the motivation behind this study. The aim of the present study was to investigate whether the level of perceived speech naturalness could derive from any specific aspects of speech. This was accomplished by recording speech samples from eight people with Parkinson’s disease using a test battery with various speech tasks. These samples were presented to a group of 27 naive listeners whose task was to judge the level of intelligibility as well as the level of speech naturalness. Correlations were then made between their assessments and various acoustic measurements. The main finding of the present study was that speech and articulation rate seemed to have the greatest impact on the perceived level of naturalness, where the people who had the slowest rate were judged to be the least natural sounding. Furthermore there were strong indications that the level of intelligibility correlated with the level of speech naturalness. In this study there were no other acoustic correlates found with statistical significance.
|
159 |
Improving Understanding and Trust with Intelligibility in Context-Aware ApplicationsLim, Brian Y. 01 May 2012 (has links)
To facilitate everyday activities, context-aware applications use sensors to detect what is happening and use increasingly complex mechanisms (e.g., by using big rule-sets or machine learning) to infer the user's context and intent. For example, a mobile application can recognize that the user is in a conversation and suppress any incoming calls. When the application works well, this implicit sensing and complex inference remain invisible. However, when it behaves inappropriately or unexpectedly, users may not understand its behavior. This can lead users to mistrust, misuse, or even abandon it. To counter this lack of understanding and loss of trust, context-aware applications should be intelligible, capable of explaining their behavior.
We investigate providing intelligibility in context-aware applications and evaluate its usefulness to improve user understanding and trust in context-aware applications. Specifically, this thesis supports intelligibility in context-aware applications through the provision of explanations that answer different question types, such as: Why did it do X? Why did it not do Y? What if I did W, What will it do? How can I get the application to do Y?
This thesis takes a three-pronged approach to investigating intelligibility by (i) eliciting the user requirements for intelligibility, to identify what explanation types end-users are interested in asking context-aware applications, (ii) supporting the development of intelligible context-aware applications with a software toolkit and the design of these applications with design and usability recommendations, and (iii) evaluating the impact of intelligibility on user understanding and trust under various situations and application reliability, and measuring how users use an interactive intelligible prototype. We show that users are willing to use well-designed intelligibility features, and this can improve user understanding and trust in the adaptive behavior of context-aware applications.
|
160 |
Disfluency in Swedish human-human and human-machine travel booking dialogues /Eklund, Robert, January 2004 (has links)
Diss. Linköping : Univ., 2004.
|
Page generated in 0.0906 seconds