1 |
Creaky voice: an interactional resource for indexing authorityHildebrand-Edgar, Nicole 15 August 2016 (has links)
This project explores the social meaning potential of creaky voice using a third wave
variationist approach in order to uncover what motivates speakers to deploy this vocal
quality. Intraspeaker variation in the use of creak is quantitatively and qualitatively
examined in case studies of one male and one female individual who come from a similar
social group. In recordings from a range of casual settings, both the male and female
speaker are found to use creak at similar rates, for similar purposes. However, creak is
found to vary across social settings: the greater the speakers’ self-reported intimacy with
their interlocutors, the lower the frequency of creak. This suggests that creaky voice is
used for interactional functions, and is conditioned by conversational context. Qualitative
discourse analysis of instances of creak further reveals that it has a high frequency of cooccurrence with linguistic features used for epistemic stancetaking. I suggest that creak is
an interactional resource available for taking an authoritative position in interaction,
especially in situations where speakers feel less intimately connected to their
interlocutors. / Graduate / 2017-08-02 / 0290 / 0291 / nchildebrand@gmail.com
|
2 |
Descrição da qualidade de voz por meio de proposta de avaliação com motivação fonética / Voice quality description from a phonetic perspectiveFernandes, Ana Carolina Nascimento 14 July 2011 (has links)
Made available in DSpace on 2016-04-27T18:11:53Z (GMT). No. of bitstreams: 1
Ana Carolina Nascimento Fernandes.pdf: 4403080 bytes, checksum: 9af0f7fe11b64fe347c7a87287f6d43e (MD5)
Previous issue date: 2011-07-14 / Conselho Nacional de Desenvolvimento Científico e Tecnológico / To describe the voice quality of a group from a phonetic
perspective. Method: 60 samples of speech were evaluated under
perspective of the VPAS-BP script. All speech samples were edited by using
the program PRAAT. The signal to noise ratio was calculated to guarantee
the quality of the sound recorded, to permit an efficient perceptual
assessment. After that, 40 seconds of each sample was edited and grouped
into a script of perception. 20 samples were randomly selected and repeated
to ensure reliability of the judge who subsequently evaluated these
samples. This script grouped the speech samples and randomly redistributed
them, it assured that the judge did not have access to his own answers at the
time of the replications. The perceptual analysis was held by a speech
pathologist expert in voice, with more than three years of experience in the
application of the VPAS-BP script. Results - In this study, harsh voice setting
was present in the majority of the samples, and it appeared in a great
correlation with whisper and laryngeal tension. It was observed the
interdependence of some voice quality settings and some aspects of the
dynamics of the voice. Conclusion: It was possible to describe and to detect
some combinations between settings of quality of voice and laryngeal and
vocal tract tension that can signal some voice problems / Descrever a qualidade vocal do ponto de vista fonético em um
conjunto de amostras previamente investigado por outras modalidades de
análise. Método: Estudo do tipo retrospectivo descritivo, com análise
perceptivo-auditiva de 60 amostras de fala por meio do roteiro VPAS-PB.
Todas as amostras de fala foram editadas por meio do programa PRAAT.
Foi calculada a relação sinal ruído de gravação para todas elas e foram
selecionadas as gravações em que a proporção sinal/ruído apresentou valor
maior do que 2 para evitar interferências de ruídos externos no momento da
avaliação perceptivo-auditiva. Na seqüência, foram editados e recortados 40
segundos do trecho de fala semi-espontânea de cada gravação. Em seguida
foram agrupadas em um script de percepção da qualidade vocal e um terço
das 60 amostras foram sorteadas (totalizando 20) e repetidas para garantir a
confiabilidade do juiz que posteriormente avaliou essas amostras. Esse
script agrupou as amostras de fala e as redistribuiu aleatoriamente, o que
permitiu no momento da apresentação ao juiz, que os mesmos não
percebessem a seqüência da apresentação e nem mesmo quais amostras
haviam sido repetidas. Isso garantiu que o juiz não tivesse acesso às suas
próprias respostas no momento das repetições. Foi realizada análise
perceptivo-auditiva por uma fonoaudióloga especialista em voz, com mais de
três anos de experiência na aplicação do roteiro utilizado neste estudo, que
apresentou confiabilidade e consistência interna alta na análise dos dados
(coeficiente alfa de 0,777 para a análise dos graus de ajustes e coeficiente
alfa de 0,814 para análise de ausência ou presença de ajustes). Após o
julgamento perceptivo-auditivo as avaliações do juiz foram submetidas à
análise estatística descritiva e análise multivariada. Resultados: No
presente estudo o ajuste de qualidade vocal mais percebido foi voz áspera,
o qual apareceu em grande associação ao ajuste escape de ar. O ajuste
hiperfunção laríngea também foi característica marcante nas amostras
analisadas e também apareceu em co-ocorrência ao ajuste voz áspera. A
voz modal foi percebida em poucos falantes. Os ajustes expansão faríngea e
laringe abaixada também apareceram associados neste estudo. Foi
observada a interdependência de ajustes de qualidade vocal e aspectos de
dinâmica vocal. Conclusão: Foi possível descrever o que acontece na fala
do grupo e observar a existência de ajustes mais compatíveis com certas
adaptações laríngeas sendo possível detectar combinações de ajustes no
trato vocal, laríngeos e de tensão que sinalizam quadros de alteração de voz
no grupo pesquisado
|
3 |
Análise comparativa da voz em jovens mulheres antes e depois da prova de fala contínua / Comparative analysis of voice in young women before and after the continuous speech testPereira, Patricia Moraes 17 December 2015 (has links)
OBJETIVO: Comparar a voz de mulheres antes e depois de 60 e 90 minutos de prova de fala contínua, e após repouso de 15 minutos. MÉTODOS: Trinta e uma mulheres com idade entre 18 e 25 anos, foram submetidas à tarefa de resistência fonatória, utilizando-se da leitura de um texto padrão por 90 minutos, repetido até que o tempo se esgotasse. Antes da tarefa de fala contínua, após 60 minutos, após 90 minutos, e depois de 15 minutos de repouso vocal absoluto, aplicou-se questionário para conhecimento do bem estar vocal, registrou-se a emissão prolongada da vogal \"a\", para posterior extração das medidas acústicas e da análise perceptivo-auditiva com o uso da escala GIRBAS. A seguir, fez-se a mensuração das medidas do sistema aéreo fonatório, empregando os protocolos: capacidade vital pulmonar (CVP), tempo máximo de fonação (TMF) e a eficiência vocal (EV). A intensidade vocal foi registrada com decibelímetro e a auto-avaliação da percepção auditiva, tátil e cinestésica da voz com o uso de uma escala visual analógica de 100mm. RESULTADOS: Após 60 minutos de fala, aumentou a frequência fundamental (f0) de 215,4 para 220,2Hz (p<0,01), a ATRI (p=0,04) e a NHR (p=0,03), e com 90 minutos, a f0 variou de 215,4 para 223,6Hz (p<0,01), aumentando também a Fhi (p= 0,04) e a Flo (p= 0,02), e diminuindo a APQ (p=0,01) e a VTi (p=0,04). Comparando as medidas observadas na pré-prova e após o repouso, aumentaram f0 (p<0,01), Fhi (p=0,02) e Flo (p=0,03). Entre os tempos 60 minutos e após o repouso, houve aumento da PPQ (p=0,04), da ATRI (p=0,06) e da NHR (p=0,02). Para 90 minutos e repouso, a PPQ (p=0,03) e a Fatr (p=0,04) aumentaram. Vinte e sete participantes apresentaram grau geral da disfonia 1 tanto para 60 minutos, quanto para 90 minutos, e quatro passaram a apresentar grau 2 em 90 minutos (p=0,04). O parâmetro instabilidade alterou de grau 1, com 60 minutos, para grau 2, com 90 minutos de fala contínua (p=<0,01). A intensidade habitual aumentou (p<0,01) de 61,4 para 63,4dB após 90 minutos. Após o repouso, houve diminuição da intensidade (p=0,01), em relação ao pré-prova. Nas medidas observadas pelo sistema aéreo fonatório, o fluxo de ar expiratório diminuiu após 90 minutos de fala (p=0,04), aumentando depois do repouso (p=0,04). Após 90 minutos de fala a f0(Hz) aumentou 211,85 para 221,54 (p<0,01). A resistência aerodinâmica, impedância acústica e eficiência aerodinâmica aumentaram após 60 e após 90 minutos de fala. A auto avaliação perceptivo-auditiva e tátil-cinestésica da voz, observou que após 90 minutos de fala contínua todos os sintomas pioram, exceto a rouquidão e a voz grave. CONCLUSÃO: Houveram alterações das medidas acústicas após tarefa de fala contínua. O grau geral da disfonia e a instabilidade vocal aumentaram após 90 minutos de fala contínua. As medidas aerodinâmicas se comportaram de forma divergente entre os protocolos utilizados e os tempos de avaliação. A intensidade vocal habitual aumentou após 90 minutos de fala contínua e os sintomas perceptuais auditivos e tátil-cinestésicos aumentaram após a tarefa de fala contínua / PURPOSE: To compare the woman\'s voice before and after 60 and 90 minutes of continuous speech test, and after 15 break minutes. METHODS: Thirty-one women aged between 18 and 25 years, were submitted to phonation endurance task, using the reading of a standard text for 90 minutes repeating until the time was over. Before the continuous speech task after 60 minutes, after 90 minutes, and after 15 minutes of absolute voice rest, it was applied a questionnaire to knowledge of well-being vocal and was recorded the prolonged vowel \"a\", to take in a posterior time the acoustic measurements and perceptual analysis using the GIRBAS scale. Then, it was done the mensuration of measures phonation air system, using the protocols: lung vital capacity (LVC), maximum phonation time (MPT) and vocal efficiency (VE). The vocal intensity was recorded with a decibelimeter and self-assessment of auditory, tactile and kinesthetic perception of voice using a visual analog scale of 100mm. RESULTS: After 60 minutes of speech, increased the fundamental frequency (f0) of 215.4 for 220,2Hz (p <0.01), the ATRI (p = 0.04) and NHR (p = 0.03) with 90 minutes f0 ranged from 215.4 to 223,6Hz (p <0.01), also increasing FHI (p = 0.04), and Flo (p = 0.02) and decreasing APQ (p = 0.01) and VTi (p = 0.04). Comparing the measures observed in pre-test and after the break, increased f0 (p <0.01), FHI (p = 0.02) and Flo (p = 0.03). Between time of 60 minutes and after the break, was observed an increase in PPQ (p = 0.04) of ATRI (p = 0.06) and NHR (p = 0.02). For 90 minutes rest, PPQ (p = 0.03) and Fatr (p = 0.04) increased. Twentyseven subjects had overall grade of dysphonia 1 for both 60 minutes and for 90 minutes and began to show four grade 2 in 90 minutes (p = 0.04). The parameter of instability changed of step 1, with 60 minutes to 2 degree, with 90 minutes of continuous speech (p = <0.01). The usual intensity increased (p <0.01) 61.4 to 63,4dB after 90 minutes. After the break, there was a decrease in the intensity (p = 0.01), compared to the pre-test. In the measurements observed by phonation air system the flow of expiratory air decreased after 90 minutes of speech (p = 0.04) and raised after the rest (p = 0.04). After 90 minutes of speech f0 (Hz) to 221.54 211.85 increased (p <0.01). The aerodynamic resistance, acoustic impedance and aerodynamic efficiency increased after 60 and after 90 minutes of speech. Self perceptual assessment and tactile-kinesthetic voice, noted that after 90 minutes of continuous talk all the symptoms get worse, except for hoarseness and a deep voice. CONCLUSION: There were changes of acoustic measurements after continuous speech task. The overall degree of dysphonia and vocal instability increased after 90 minutes of continuous speech. The aerodynamic measures worked in different ways about the protocols used and the time evaluation. The usual voice intensity increased after 90 minutes of continuous speech and perceptual symptoms auditory and tactilekinesthetic increased after continuous speech task
|
4 |
Acoustic and perceptual aspects of vocal function in children with adenotonsillar hypertrophy —effects of surgeryLundeborg Hammarström, Inger, Hultcrantz, Elisabeth, Ericsson, Elisabeth, McAllister, Anita January 2012 (has links)
Objective: To evaluate outcome of two types of tonsil surgery (tonsillectomy+adenoidectomy or tonsillotomy +adenoidectomy) on vocal function perceptually and acoustically. Study Design: Sixty-seven children, aged 50-65 months, on waiting list for tonsil surgery were randomized to tonsillectomy (n=33) or tonsillotomy (n=34). Fifty-seven age and gender matched healthy pre-school children were controls. Twenty-eight of them, aged 48-59 months, served as control group before surgery, and 29, aged 60-71 months, after surgery Methods: Before surgery and six months postoperatively, the children were recorded producing three sustained vowels (/A, u, i/) and 14 words. The control groups were recorded only once. Three trained speech and language pathologists performed the perceptual analysis using Visual Analogue Scales (VAS) for eight voice quality parameters. Acoustic analysis from sustained vowels included average fundamental frequency, jitter percent, shimmer percent, noise-to-harmonic ratio and the centre frequencies of formants 1-3 Results: Before surgery the children were rated to have more hyponasality and compressed/throaty voice (p<0,05) and lower mean pitch (p<0,01) in comparison to the control group. They also had higher perturbation measures and lower frequencies of the second and third formant. After surgery there were no differences perceptually. Perturbation measures decreased but were still higher compared to the control group’s, p<0, 05. Differences in formant frequencies for /i/ and /u/ remained. No differences were found between the two surgical methods. Conclusion: Voice quality is affected perceptually and acoustically by adenotonsillar hypertrophy. After surgery the voice is perceptually normalized but acoustic differences remain. Outcome was equal for both surgical methods.
|
5 |
Voice Quality And Gender Identification: Acoustic And Perceptual AnalysisCain Porter, Courtney 19 June 2012 (has links)
The voice is a fundamental method of communication and as such, helps in our efforts to define our identity. Projection of the appropriate voice is crucially important to transgender individuals in transition for acceptance as their identified gender. This study attempts to identify and examine the relationship between acoustic measurements of voice quality and the perception of speaker gender from audio recordings, including the male-to-female transgender voice, based on several acoustic properties that have been identified by previous studies. Recordings of female, male and transgender voices were acoustically analyzed for properties relating to differences in voice quality between men and women. Listeners then identified the gender of the recorded voices, with the intention of evaluating which voices are perceived as either male or female along with a corresponding rating of masculinity or femininity. What acoustic measurements of voice quality cue listeners to gender and do they correlate with gender perception?
|
6 |
The Epilarynx in SpeechMoisik, Scott 16 July 2013 (has links)
This dissertation examines the phonetic and phonological functioning of the supraglottal part of the larynx, the epilarynx, from an articulatory-physiological perspective. The central thesis is that, through constriction, the epilarynx physically couples the vocal folds to the supralaryngeal vocal tract. This basic principle is important in explaining a wide range of speech phenomena, such as the mechanism of glottal stop, creaky and harsh (“constricted”) phonation, interaction between vocal fold state and lingual state, and the coordination of phonatory and vowel quality as voice quality, which underlies many register-like patterns. Furthermore, oscillation of the epilarynx and (typically) the vocal folds below is the basis for “growl”, which is demonstrated to have numerous expressions in speech, both phonetically and phonologically.
The thesis is explored by detailed examination of three functions of the epilarynx: (1) epilaryngeal vibration, (2) epilaryngeal interaction with the vocal folds, and (3) epilaryngeal interaction with the supralaryngeal vocal tract. Phonetic evaluations of these functions include physiological, theoretical, and taxonomic considerations, imaging data (obtained with laryngeal and lingual ultrasound, simultaneous laryngoscopy and laryngeal ultrasound, and videofluoroscopy), and computational modeling.
These phonetic evaluations are then taken as the basis for a model of lower vocal tract phonology. Traditional models of such sounds do not accommodate the epilarynx. Rather than positing new distinctive features, an alternative approach is taken. A theoretical model is proposed which is framed in terms of “phonological potentials”, which are the biases associated with physical principles that underlie the formation of phonological systems and patterns. In the context of epilaryngeal function, the phonological potentials are expressed in terms of synergistic relations amongst gross physiological states that either support or hinder epilaryngeal constriction. These biases are argued to exert an articulation-based typological skewing on phonemic systems and patterning, and numerous cases are examined in support of this claim. / Graduate / 0290
|
7 |
Análise comparativa da voz em jovens mulheres antes e depois da prova de fala contínua / Comparative analysis of voice in young women before and after the continuous speech testPatricia Moraes Pereira 17 December 2015 (has links)
OBJETIVO: Comparar a voz de mulheres antes e depois de 60 e 90 minutos de prova de fala contínua, e após repouso de 15 minutos. MÉTODOS: Trinta e uma mulheres com idade entre 18 e 25 anos, foram submetidas à tarefa de resistência fonatória, utilizando-se da leitura de um texto padrão por 90 minutos, repetido até que o tempo se esgotasse. Antes da tarefa de fala contínua, após 60 minutos, após 90 minutos, e depois de 15 minutos de repouso vocal absoluto, aplicou-se questionário para conhecimento do bem estar vocal, registrou-se a emissão prolongada da vogal \"a\", para posterior extração das medidas acústicas e da análise perceptivo-auditiva com o uso da escala GIRBAS. A seguir, fez-se a mensuração das medidas do sistema aéreo fonatório, empregando os protocolos: capacidade vital pulmonar (CVP), tempo máximo de fonação (TMF) e a eficiência vocal (EV). A intensidade vocal foi registrada com decibelímetro e a auto-avaliação da percepção auditiva, tátil e cinestésica da voz com o uso de uma escala visual analógica de 100mm. RESULTADOS: Após 60 minutos de fala, aumentou a frequência fundamental (f0) de 215,4 para 220,2Hz (p<0,01), a ATRI (p=0,04) e a NHR (p=0,03), e com 90 minutos, a f0 variou de 215,4 para 223,6Hz (p<0,01), aumentando também a Fhi (p= 0,04) e a Flo (p= 0,02), e diminuindo a APQ (p=0,01) e a VTi (p=0,04). Comparando as medidas observadas na pré-prova e após o repouso, aumentaram f0 (p<0,01), Fhi (p=0,02) e Flo (p=0,03). Entre os tempos 60 minutos e após o repouso, houve aumento da PPQ (p=0,04), da ATRI (p=0,06) e da NHR (p=0,02). Para 90 minutos e repouso, a PPQ (p=0,03) e a Fatr (p=0,04) aumentaram. Vinte e sete participantes apresentaram grau geral da disfonia 1 tanto para 60 minutos, quanto para 90 minutos, e quatro passaram a apresentar grau 2 em 90 minutos (p=0,04). O parâmetro instabilidade alterou de grau 1, com 60 minutos, para grau 2, com 90 minutos de fala contínua (p=<0,01). A intensidade habitual aumentou (p<0,01) de 61,4 para 63,4dB após 90 minutos. Após o repouso, houve diminuição da intensidade (p=0,01), em relação ao pré-prova. Nas medidas observadas pelo sistema aéreo fonatório, o fluxo de ar expiratório diminuiu após 90 minutos de fala (p=0,04), aumentando depois do repouso (p=0,04). Após 90 minutos de fala a f0(Hz) aumentou 211,85 para 221,54 (p<0,01). A resistência aerodinâmica, impedância acústica e eficiência aerodinâmica aumentaram após 60 e após 90 minutos de fala. A auto avaliação perceptivo-auditiva e tátil-cinestésica da voz, observou que após 90 minutos de fala contínua todos os sintomas pioram, exceto a rouquidão e a voz grave. CONCLUSÃO: Houveram alterações das medidas acústicas após tarefa de fala contínua. O grau geral da disfonia e a instabilidade vocal aumentaram após 90 minutos de fala contínua. As medidas aerodinâmicas se comportaram de forma divergente entre os protocolos utilizados e os tempos de avaliação. A intensidade vocal habitual aumentou após 90 minutos de fala contínua e os sintomas perceptuais auditivos e tátil-cinestésicos aumentaram após a tarefa de fala contínua / PURPOSE: To compare the woman\'s voice before and after 60 and 90 minutes of continuous speech test, and after 15 break minutes. METHODS: Thirty-one women aged between 18 and 25 years, were submitted to phonation endurance task, using the reading of a standard text for 90 minutes repeating until the time was over. Before the continuous speech task after 60 minutes, after 90 minutes, and after 15 minutes of absolute voice rest, it was applied a questionnaire to knowledge of well-being vocal and was recorded the prolonged vowel \"a\", to take in a posterior time the acoustic measurements and perceptual analysis using the GIRBAS scale. Then, it was done the mensuration of measures phonation air system, using the protocols: lung vital capacity (LVC), maximum phonation time (MPT) and vocal efficiency (VE). The vocal intensity was recorded with a decibelimeter and self-assessment of auditory, tactile and kinesthetic perception of voice using a visual analog scale of 100mm. RESULTS: After 60 minutes of speech, increased the fundamental frequency (f0) of 215.4 for 220,2Hz (p <0.01), the ATRI (p = 0.04) and NHR (p = 0.03) with 90 minutes f0 ranged from 215.4 to 223,6Hz (p <0.01), also increasing FHI (p = 0.04), and Flo (p = 0.02) and decreasing APQ (p = 0.01) and VTi (p = 0.04). Comparing the measures observed in pre-test and after the break, increased f0 (p <0.01), FHI (p = 0.02) and Flo (p = 0.03). Between time of 60 minutes and after the break, was observed an increase in PPQ (p = 0.04) of ATRI (p = 0.06) and NHR (p = 0.02). For 90 minutes rest, PPQ (p = 0.03) and Fatr (p = 0.04) increased. Twentyseven subjects had overall grade of dysphonia 1 for both 60 minutes and for 90 minutes and began to show four grade 2 in 90 minutes (p = 0.04). The parameter of instability changed of step 1, with 60 minutes to 2 degree, with 90 minutes of continuous speech (p = <0.01). The usual intensity increased (p <0.01) 61.4 to 63,4dB after 90 minutes. After the break, there was a decrease in the intensity (p = 0.01), compared to the pre-test. In the measurements observed by phonation air system the flow of expiratory air decreased after 90 minutes of speech (p = 0.04) and raised after the rest (p = 0.04). After 90 minutes of speech f0 (Hz) to 221.54 211.85 increased (p <0.01). The aerodynamic resistance, acoustic impedance and aerodynamic efficiency increased after 60 and after 90 minutes of speech. Self perceptual assessment and tactile-kinesthetic voice, noted that after 90 minutes of continuous talk all the symptoms get worse, except for hoarseness and a deep voice. CONCLUSION: There were changes of acoustic measurements after continuous speech task. The overall degree of dysphonia and vocal instability increased after 90 minutes of continuous speech. The aerodynamic measures worked in different ways about the protocols used and the time evaluation. The usual voice intensity increased after 90 minutes of continuous speech and perceptual symptoms auditory and tactilekinesthetic increased after continuous speech task
|
8 |
Computerised GRBAS assessement of voice qualityJalalinajafabadi, Farideh January 2016 (has links)
Vocal cord vibration is the source of voiced phonemes in speech. Voice quality depends on the nature of this vibration. Vocal cords can be damaged by infection, neck or chest injury, tumours and more serious diseases such as laryngeal cancer. This kind of physical damage can cause loss of voice quality. To support the diagnosis of such conditions and also to monitor the effect of any treatment, voice quality assessment is required. Traditionally, this is done ‘subjectively’ by Speech and Language Therapists (SLTs) who, in Europe, use a well-known assessment approach called ‘GRBAS’. GRBAS is an acronym for a five dimensional scale of measurements of voice properties. The scale was originally devised and recommended by the Japanese Society of Logopeadics and Phoniatrics and several European research publications. The proper- ties are ‘Grade’, ‘Roughness’, ‘Breathiness’, ‘Asthenia’ and ‘Strain’. An SLT listens to and assesses a person’s voice while the person performs specific vocal maneuvers. The SLT is then required to record a discrete score for the voice quality in range of 0 to 3 for each GRBAS component. In requiring the services of trained SLTs, this subjective assessment makes the traditional GRBAS procedure expensive and time-consuming to administer. This thesis considers the possibility of using computer programs to perform objective assessments of voice quality conforming to the GRBAS scale. To do this, Digital Signal Processing (DSP) algorithms are required for measuring voice features that may indicate voice abnormality. The computer must be trained to convert DSP measurements to GRBAS scores and a ‘machine learning’ approach has been adopted to achieve this. This research was made possible by the development, by Manchester Royal Infirmary (MRI) Hospital Trust, of a ‘speech database’ with the participation of clinicians, SLT’s, patients and controls. The participation of five SLTs scorers allowed norms to be established for GRBAS scoring which provided ‘reference’ data for the machine learning approach.
To support the scoring procedure carried out at MRI, a software package, referred to as GRBAS Presentation and Scoring Package (GPSP), was developed for presenting voice recordings to each of the SLTs and recording their GRBAS scores. A means of assessing intra-scorer consistency was devised and built into this system. Also, the assessment of inter-scorer consistency was advanced by the invention of a new form of the ‘Fleiss Kappa’ which is applicable to ordinal as well as categorical scoring. The means of taking these assessments of scorer consistency into account when producing ‘reference’ GRBAS scores are presented in this thesis. Such reference scores are required for training the machine learning algorithms. The DSP algorithms required for feature measurements are generally well known and available as published or commercial software packages. However, an appraisal of these algorithms and the development of some DSP ‘thesis software’ was found to be necessary. Two ‘machine learning’ regression models have been developed for map- ping the measured voice features to GRBAS scores. These are K Nearest Neighbor Regression (KNNR) and Multiple Linear Regression (MLR). Our research is based on sets of features, sets of data and prediction models that are different from the approaches in the current literature. The performance of the computerised system is evaluated against reference scores using a Normalised Root Mean Squared Error (NRMSE) measure. The performances of MLR and KNNR for objective prediction of GRBAS scores are compared and analysed ‘with feature selection’ and ‘without feature selection’. It was found that MLR with feature selection was better than MLR without feature selection and KNNR with and without feature selection, for all five GRBAS components. It was also found that MLR with feature selection gives scores for ‘Asthenia’ and ‘Strain’ which are closer to the reference scores than the scores given by all five individual SLT scorers. The best objective score for ‘Roughness’ was closer than the scores given by two SLTs, roughly equal to the score of one SLT and worse than the other two SLT scores. The best objective scores for ‘Breathiness’ and ‘Grade’ were further from the reference scores than the scores produced by all five SLT scorers. However, the worst ‘MLR with feature selection’ result has normalised RMS error which is only about 3% worse than the worst SLT scoring. The results obtained indicate that objective GRBAS measurements have the potential for further development towards a commercial product that may at least be useful in augmenting the subjective assessments of SLT scorers.
|
9 |
The Impact of the Narrator’s Gender on Multimedia LearningJanuary 2019 (has links)
abstract: The utilization of multimedia videos has increasingly become more popular, especially in the field of education. In order to facilitate learning it is important to create a natural interaction between the learner and the on-screen material. This study focused on improving the facilitation of the information within a multimedia learning video by focusing on the gender and quality of computer-synthesized voices. Using a randomized pretest - posttest design the study looked at how the gender of the narrator affected a person's ability to learn and implement a new task. Narration was performed by a male and female, classic and modern synthesized voices to determine if there were gender effects across both generations of voices. The participants’ learned knowledge was assessed through a multiple-choice assessment and a word to image matching transfer assessment. Results showed no significant results. Future studies should consider a more reliable knowledge assessment and utilize and larger sample size. / Dissertation/Thesis / Masters Thesis Human Systems Engineering 2019
|
10 |
A sociophonetic investigation of ethnolinguistic differences in voice quality among young, South African English speakersWileman, Bruce Rory 03 September 2018 (has links)
Prior research has suggested that there may be differences in voice quality between black and white speakers of South African English who had attended well-resourced middle-class schools. The principal objective of the study is to address the question of whether there is any acoustic evidence of such differences. The study then proceeds to describe such acoustic evidence for differences in voice quality. The author interviewed 36 female South African English speakers (18 white and 18 black) between the ages of 18 and 22. The research subjects had all attended well-resourced middleclass schools. In order to control for the possibility of substrate influences on voice quality, all black participants were of an isiXhosa language background. High quality sound recordings were conducted, consisting of both a set of read sentences as well as semi-structured interviews, the latter of which formed the core dataset for the subsequent acoustic analysis. The acoustic data were analyzed using VoiceSauce, a program specifically designed for the acoustic analysis of voice quality. Measurements were based on automatically segmented speech samples using FAVE and PRAAT. The VoiceSauce measurement data were statistically analyzed by means of a linear mixed effects regression analysis and Wilcoxon rank sum tests using the statistical package R to evaluate the significance of ethnicity as a variable. The effect of ethnicity was found to be significant for several measures of spectral tilt (including for example, 2K*-5K, H4*-2K*, H1*-H2* and H1*-A1*) and cepstral peak prominence with a nearly significant effect for the subharmonics-to-harmonics ratio. Black speakers exhibited consistently higher values for most harmonic differential measures (for example, H1*-A1*) overall, while white speakers exhibited higher values for fundamental frequency, harmonics-tonoise ratio and cepstral peak prominence. The author concludes that the acoustic evidence is most consistent with the hypothesis that the white speakers overall typically use a voice quality iii characterized by greater vocal fold constriction, thickness and stiffness in comparison to the black speakers, hypothesized to use a voice quality characterized by more breathiness. By providing a description of voice quality variation, the research contributes towards a more complete account of sociolinguistic variation in South African English.
|
Page generated in 0.1005 seconds