Global ETD Search

191	An Acoustic Analysis of Elements of Contrastive Stress Produced by 8 to 10-Year-Old Children Clover, Nicole Michelle 03 August 2012 (has links) (PDF) Contrastive stress is an aspect of communication that can be used to highlight information, de-accent redundant information, and create distinctions between new and previously-provided information. Previous research has documented that adult speakers use relative changes in their vocal intensity, fundamental frequency (F0), and duration to mark contrastive stress in a sentence. However, less is understood about how and when children mark contrastive stress in their communication, thus the current study aims to examine a number of acoustic elements of contrastive stress in 8 to 10-year-old children. Speech samples were elicited from 20 children and analyzed to determine if the acoustic parameters of F0, intensity, and duration varied as a function of the speaking condition, speaker gender, or grammatical unit. Results of the experiment suggest that when comparing the baseline speaking condition to the speaking condition eliciting contrastive stress, significant differences were only found for the acoustic measure of mean intensity. Additionally, gender-related differences in contrastive stress were found only for the dependent measure of F0 slope, with a greater F0 slope exhibited by female speakers. All grammatical units were significantly different from one another across a number of variables, with significant interactions between baseline and target conditions and grammatical unit being analyzed. As indicated in previous research, the findings of the present study may indicate that children under 10 years of age may not have developed contrastive stress in an adult-like manner. Results may also be due to individual speaker differences, the complex nature of prosody, or measurement methodology. Prosody contrastive stress acoustic analysis children prosodic development Communication Sciences and Disorders
192	The Effect of Pause Duration on Intelligibility of Non-Native Spontaneous Oral Discourse Lege, Ryan Frederick 01 December 2012 (has links) (PDF) Pausing is a natural part of human speech. Pausing is used to segment speech, negotiate meaning, and allow for breathing. In oral speech, pausing, along with other suprasegmental features, plays a critical role in creating meaning as comprehensible speech is seen as a goal for language learners around the world. In order to be comprehensible, language learners need to learn to pause correctly in their speaking. Though this notion is widely accepted by applied linguists and many language teachers, the effect of pausing on intelligibility of spontaneous oral discourse has not been established by empirical data. This study isolates pause duration in spontaneous oral discourse in order to establish its connection to the intelligibility of non-native speech. In this study, North American undergraduate students' reactions to non-native pause duration in spontaneous oral discourse were examined. The task involved measuring the NESs' processing, comprehension, and evaluation of three different versions of an international teaching assistant's presentation: One with unmodified pause duration, one with pause duration shortened by 50%, and a third passage with pause duration lengthened by 50%. Results showed a positive correlation between pause duration and number of listeners able to identify main ideas. Finally, listener reaction was measurably more positive to the unmodified passage than to the passages with lengthened or shortened pauses. pausing pause duration oral discourse hesitation intelligibility comprehensibility prosody non-native juncture suprasegmentals TESOL linguistics Linguistics
193	Emotional Prosody in Adverse Acoustic Conditions : Investigating effects of emotional prosody and noise-vocoding on speech perception and emotion recognition Ivarsson, Cecilia January 2022 (has links) Speech perception is a fundamental function of successful vocal communication, and through prosody, we can communicate different emotions. The ability to recognize emotions is important in social interaction. Emotional prosody facilitates emotion recognition in vocal communication. Acoustic conditions are not always optimal, due to either environmental disturbances or hearing loss. When perceiving speech and recognizing emotions we make use of multimodal sources of information. The effect of noise-vocoding on speech perception and emotion recognition can increase the knowledge of these abilities. The effect of emotional prosody on speech perception and emotion recognition ability in adverse acoustic conditions is not widely explored. To explore the role of emotional prosody during adverse acoustic conditions, an online test was created. 18 participants (8 women) listened to semantically neutral sentences with different emotions expressed in prosody and presented with five different levels of noise (NV1, NV3, NV6, NV12, and Clear) using noise-vocoding. Participants’ task was to reproduce the spoken words and identify the expressed emotion (happy, surprised, angry, sad, or neutral). A Reading span test was included to investigate any potential correlation between working memory capacity and the ability to recognize emotions in prosody. Statistical analysis suggests speech perception could be facilitated by emotional prosody when sentences are noise-vocoded. The ability to recognize emotions in emotional prosody differentiated between the emotions on the different noise levels. The ability to recognize anger was least affected by noise-vocoding, and sadness was most affected. Correlation analysis shows no significant result between working memory capacity and emotion recognition accuracy. Speech perception Emotional prosody Emotion recognition Noise-vocoding Working memory capacity Psychology Psykologi
194	Förmågan att genomskåda en röstklon : Faktorer som påverkar genomskådning av AI-genererade röstkloner / The ability to see through a voice clone Dalman, Gabriella, Hedin, Jonathan January 2020 (has links) I takt med att maskininlärning utvecklats under senare år har skapandet av så kallade deep fakes, falsk media skapad med denna teknik, oftast video eller bilder, blivit lättare. Röstkloner är ett ämne inom talteknologin som kan sägas vara motsvarigheten för deep fakes för röster. Tidigare studier har redovisat nya tekniker för att använda neurala nätverk för att återskapa trovärdiga kloner av människors röster, men få studier har gjorts på de perceptionella faktorerna hos en människas förmåga att avgöra äktheten hos kloner. Vi gjorde därför en studie med en manlig och en kvinnlig röstklon där deltagare som var bekanta med talarnas röster sen tidigare fick avgöra äktheten hos en serie klipp varibland röstkloner var inkluderade. Frekvensomfånget begränsades i klippen i olika utsträckning för att undersöka om det fanns ett samband mellan omfången och deltagarnas förmågor. Resultaten av undersökningen visar att frekvensomfången inte hade någon statistiskt signifikant påverkan och att de avgörande faktorerna istället var prosodi och förekomsten av artefakter i ljudklippen. Däremot fanns det en betydlig skillnad mellan framgången att genomskåda den manliga röstklonen gentemot den kvinnliga, där deltagarna i större utsträckning genomskådade den manliga. / As machine learning has advanced in later years the creation of deep fakes, fake media created using this technology, most often video or images, has become easier. Voice cloning is a subject in speech technology that can be said to be the equivalent of deep fakes for voices. Earlier studies have proposed new techniques to use neural networks to create believable clones of human voices, but few studies have been made concerning the perceptual factors of the human ability to discern the authenticity in voice clones. Therefore we made a study with one male and one female voice clone where participants familiar with the speaker’s voices determined the authenticity of a series of clips wherein voice clones were included. Different frequency ranges were applied in order to analyse if there was a correlation between the frequency ranges and the participants’ abilities. The results of the study show that the frequency range did not make a statistically significant difference and that the determining factors instead were prosody and artefacts in the sound clips. However, there was a significant difference between the success of detecting the male and female voice clone, where the participants more frequently detected the male voice clone. Media and Communication Technology Medieteknik
195	Codificación de la evidencialidad en el español chileno: Estrategias prosódicas con parecer / Coding evidentiality in Chilean Spanish: Prosodic strategies with parecer ‘seem’ Ugarte Bern, Sophie Charlotte January 2023 (has links) En el presente estudio se ha analizado la frecuencia fundamental (F0) o pitch de parece mediante el programa PRAAT, en enunciados que difieren respecto al modo de acceso a la información (n=80), producidos por informantes chilenos entre 24-50 años. Para la primera fase del estudio, se elicitaron enunciados experimentalmente, mediante la creación de distintas situaciones, siendo todas las situaciones correspondientes a evidencialidad indirecta, ya sea referida (n=40) o inferida (n=40), respectivamente. Además, se aplicó un análisis de sistema ToBi para examinar la cadencia final de los enunciados. La segunda fase del estudio está compuesta por un experimento ABX de distinción y un experimento de identificación, donde se les solicitó a los participantes escuchar los enunciados suscitados en la primera fase y distinguir/identificar los distintos estímulos, pertenecientes a distinto modo de acceso a la información. Los resultados indican que en el pitch de parece existe un promedio de diferencia de 1.967 semitonos (st) entre situaciones de referencia y situaciones de inferencia, siendo las situaciones referidas más prominentes, es decir, que mantienen un tono más elevado. En lo que respecta la cadencia final de los enunciados, se observó una mayoría considerable que es asertiva en inferencia, al ser comparada con situaciones referidas. Por consiguiente, es posible afirmar que en estos casos se distingue la evidencialidad mediante la prosodia. En la segunda fase del estudio se verificó una tasa de error de 23.2% en el experimento ABX de distinción y de 29.1% en el experimento de identificación, lo cual indica que los participantes son capaces de distinguir y categorizar distintos estímulos, así como también identificarlos sin etiqueta previa, en una tasa de error menor al 30% en ambos casos. / In the present study, the fundamental frequency (F0) or pitch of parecer ‘seem’ has been analysed through PRAAT, in statements that differ regarding the mode of access to the information (n=80), produced by Chilean informants between 24-50 years old. For the first phase of the study, statements were elicited experimentally, through the creation of different situations corresponding to indirect evidentiality, whether referred (n=40) or inferred (n=40), respectively. Additionally, a ToBi system analysis was applied to examine the final cadence of the utterances. The second phase of the study is composed of an ABX distinction experiment and an identification experiment, where the participants were asked to listen to the statements elicited in the first phase and distinguish/identify the different stimuli, belonging to different modes of access to the information. The results indicate that in the pitch of parecer ‘seem’ there is an average difference of 1,967 semitones (st) between reference situations and inference situations, with the referred situations being more prominent, that is, they maintain a higher pitch. Regarding the final cadence of the statements, it was observed that a considerable majority is assertive in inference, when comparing to referred situations. Consequently, it is possible to affirm that in these cases evidentiality is distinguished through prosody. In the second phase of the study, an error rate of 23.2% was verified in the ABX distinction experiment and 29.1% in the identification experiment, which indicates that the participants are capable of distinguishing and categorizing different stimuli, as well as identifying them without previous labelling, with an error rate of less than 30% in both cases. Evidentiality prosody seem. Evidencialidad prosodia parecer. Specific Languages Studier av enskilda språk
196	Computational Affect Detection for Education and Health Cooper, David G. 01 September 2011 (has links) Emotional intelligence has a prominent role in education, health care, and day to day interaction. With the increasing use of computer technology, computers are interacting with more and more individuals. This interaction provides an opportunity to increase knowledge about human emotion for human consumption, well-being, and improved computer adaptation. This thesis explores the efficacy of using up to four different sensors in three domains for computational affect detection. We first consider computer-based education, where a collection of four sensors is used to detect student emotions relevant to learning, such as frustration, confidence, excitement and interest while students use a computer geometry tutor. The best classier of each emotion in terms of accuracy ranges from 78% to 87.5%. We then use voice data collected in a clinical setting to differentiate both gender and culture of the speaker. We produce classifiers with accuracies between 84% and 94% for gender, and between 58% and 70% for American vs. Asian culture, and we find that classifiers for distinguishing between four cultures do not perform better than chance. Finally, we use video and audio in a health care education scenario to detect students' emotions during a clinical simulation evaluation. The video data provides classifiers with accuracies between 63% and 88% for the emotions of confident, anxious, frustrated, excited, and interested. We find the audio data to be too complex to single out the voice source of the student by automatic means. In total, this work is a step forward in the automatic computational detection of affect in realistic settings. affect detection intelligent tutoring systems linear classifiers multimodal sensors speech prosody visual tracking Computer Sciences
197	Prosodic Speech Production and Perception Differences Comparing Populations with Varying Levels of Autistic Traits Krizic, Monika January 2023 (has links) Autism spectrum disorder (ASD) represents a group of developmental disabilities associated with impairments in social, communicative, and imaginative abilities. Speech impairments associated with ASD can be explained by differences in cognitive processing styles relative to neurotypicals. Previous studies found that individual differences in cognitive processing influence one’s production and perception of prosody. For example, Stewart et al. (2018) found that higher levels of autistic character traits indicated by one’s Autism Spectrum Quotient (AQ) score (Baron-Cohen et al., 2001) correlated significantly with one’s ability to discriminate pitch and time, but no significant correlation between auditory discrimination thresholds for intensity. Additionally, Turnbull (2015; 2019) observed shorter overall word and vowel durations during a task which required participants with varying AQ scores to speak for the benefit of a listener with a hearing impairment. The present study examined whether prosodic cue-trading in production and perception differs when comparing populations with varying levels of autistic traits, as indicated by their AQ score differences. Furthermore, the study investigated whether these differences exist on a continuum, or rather are categorical, with respect to participants’ level of autistic character traits. To achieve this, we analyzed individual variability patterns in 18 participants’ speech production and perception. Results from the perception task showed that participants displayed a significant enhanced perception of pitch and intensity, but not duration, when completing a task where participants listened to sentences manipulating the prosodic parameters f0, intensity, duration. Results from the production task where participants read sentences designed to elicit background, broad, and narrow focus found no significant effect of AQ across any of the acoustic parameters measured, although the results for f0 are near the 5% significance level for the f0 condition, suggesting that participants with higher AQ scores may produce lower f0 ranges, and thus, less prosodic variability compared to low AQ participants. / Thesis / Master of Science (MSc) / Autism spectrum disorder (ASD) represents a group of developmental disabilities associated with impairments in communicative abilities, among others. Theories suggest that individuals with higher levels of autistic traits notice small details in the physical properties of sounds, but have trouble distinguishing the more abstract, intended meaning of the same sound patterns. Previous studies found that individual differences in the degree of autistic traits influence one’s production and perception of prosody (i.e., the relative highness or lowness of a tone),; individuals with higher levels of autistic are better able to detect fine-grained differences in pitch and time, but not loudness. The present study examined the extent to which speakers with varying levels of autistic traits use prosody during speech production and perception. This study observed that (1) individuals with higher levels of autistic traits displayed an enhanced perception of pitch and loudness, but not time, and (2) that these same participants may exhibit less variability in their production of pitch.
198	Does Reading Naturally Equal Reading Fluently? What Effect Does Read Naturally Have on the Reading Rate and Prosody of First Grade Readers? Foust, Curt Darwin 12 November 2010 (has links) No description available. Reading Instruction Read Naturally reading fluently fluent repeated reading prosody Samuels automaticity theory
199	PITCH RANGE AND PITCH DECLINATION IN ASPERGER SYNDROME: READING A DRAMATIC PASSAGE UNGER, BRANDON LLOYD 02 October 2006 (has links) No description available. Health Sciences, Speech Pathology Asperger Syndrome Prosody Pitch Range Pitch Declination Reading
200	Building a prosodically sensitive diphone database for a Korean text-to-speech synthesis system Yoon, Kyuchul 14 July 2005 (has links) No description available. Language, Linguistics prosody Korean diphone text-to-speech (TTS) synthesis grapheme-to-phoneme (GTP) phrasing

Search results