1 |
Computerised GRBAS assessement of voice qualityJalalinajafabadi, Farideh January 2016 (has links)
Vocal cord vibration is the source of voiced phonemes in speech. Voice quality depends on the nature of this vibration. Vocal cords can be damaged by infection, neck or chest injury, tumours and more serious diseases such as laryngeal cancer. This kind of physical damage can cause loss of voice quality. To support the diagnosis of such conditions and also to monitor the effect of any treatment, voice quality assessment is required. Traditionally, this is done ‘subjectively’ by Speech and Language Therapists (SLTs) who, in Europe, use a well-known assessment approach called ‘GRBAS’. GRBAS is an acronym for a five dimensional scale of measurements of voice properties. The scale was originally devised and recommended by the Japanese Society of Logopeadics and Phoniatrics and several European research publications. The proper- ties are ‘Grade’, ‘Roughness’, ‘Breathiness’, ‘Asthenia’ and ‘Strain’. An SLT listens to and assesses a person’s voice while the person performs specific vocal maneuvers. The SLT is then required to record a discrete score for the voice quality in range of 0 to 3 for each GRBAS component. In requiring the services of trained SLTs, this subjective assessment makes the traditional GRBAS procedure expensive and time-consuming to administer. This thesis considers the possibility of using computer programs to perform objective assessments of voice quality conforming to the GRBAS scale. To do this, Digital Signal Processing (DSP) algorithms are required for measuring voice features that may indicate voice abnormality. The computer must be trained to convert DSP measurements to GRBAS scores and a ‘machine learning’ approach has been adopted to achieve this. This research was made possible by the development, by Manchester Royal Infirmary (MRI) Hospital Trust, of a ‘speech database’ with the participation of clinicians, SLT’s, patients and controls. The participation of five SLTs scorers allowed norms to be established for GRBAS scoring which provided ‘reference’ data for the machine learning approach.
To support the scoring procedure carried out at MRI, a software package, referred to as GRBAS Presentation and Scoring Package (GPSP), was developed for presenting voice recordings to each of the SLTs and recording their GRBAS scores. A means of assessing intra-scorer consistency was devised and built into this system. Also, the assessment of inter-scorer consistency was advanced by the invention of a new form of the ‘Fleiss Kappa’ which is applicable to ordinal as well as categorical scoring. The means of taking these assessments of scorer consistency into account when producing ‘reference’ GRBAS scores are presented in this thesis. Such reference scores are required for training the machine learning algorithms. The DSP algorithms required for feature measurements are generally well known and available as published or commercial software packages. However, an appraisal of these algorithms and the development of some DSP ‘thesis software’ was found to be necessary. Two ‘machine learning’ regression models have been developed for map- ping the measured voice features to GRBAS scores. These are K Nearest Neighbor Regression (KNNR) and Multiple Linear Regression (MLR). Our research is based on sets of features, sets of data and prediction models that are different from the approaches in the current literature. The performance of the computerised system is evaluated against reference scores using a Normalised Root Mean Squared Error (NRMSE) measure. The performances of MLR and KNNR for objective prediction of GRBAS scores are compared and analysed ‘with feature selection’ and ‘without feature selection’. It was found that MLR with feature selection was better than MLR without feature selection and KNNR with and without feature selection, for all five GRBAS components. It was also found that MLR with feature selection gives scores for ‘Asthenia’ and ‘Strain’ which are closer to the reference scores than the scores given by all five individual SLT scorers. The best objective score for ‘Roughness’ was closer than the scores given by two SLTs, roughly equal to the score of one SLT and worse than the other two SLT scores. The best objective scores for ‘Breathiness’ and ‘Grade’ were further from the reference scores than the scores produced by all five SLT scorers. However, the worst ‘MLR with feature selection’ result has normalised RMS error which is only about 3% worse than the worst SLT scoring. The results obtained indicate that objective GRBAS measurements have the potential for further development towards a commercial product that may at least be useful in augmenting the subjective assessments of SLT scorers.
|
2 |
The influence of motor production experience on voice perceptionPinkerton, A. Louise 01 August 2016 (has links)
Perceptual speech and voice analysis is an essential skill for all speech-language pathologists, but it is a difficult skill to teach. Even the reliability for experienced experts is variable. Some training literature and practices in speech-language pathology suggest that imitating pathological voices may be useful for developing perceptual judgment. Evidence from other fields suggests that motor experience influences perception. Until now the link between production and perception of voice quality has not been addressed. The purpose of this pilot study is to test the hypothesis that imitating pathological voice samples would improve the perceptual discrimination abilities of naïve, inexperienced listeners.
Three expert listeners rated 25 voice samples using a perceptual voice evaluation scale, the Grade, Instability, Roughness, Breathiness, Asthenia, Strain Scale (GIRBAS) (Dejonckere et al., 1996), and identified anchor samples for the training protocol. These expert ratings were used to develop summary expert ratings that served as a comparison for the naïve listener ratings. Two groups of naïve undergraduate listeners received training in evaluating voice quality and in administering the GIRBAS. They completed a pretest, a training session, a homework session, and a post-test. During each activity, they rated 6 voices and provided a confidence rating for their scores. The experimental group imitated the voice samples during the study, and the control group completed the training without supplemental motor experience.
It was hypothesized that both listener groups would have improved accuracy and confidence levels between the pretest and post-test, with a larger improvement for the experimental group. Data suggested that training improved naïve listener accuracy and confidence levels and that this improvement was maintained for at least seven days after the initial training. Post-test accuracy for both groups was approximately the same. Imitation did not improve the accuracy of ratings, although those subjects had higher confidence levels. The data supported previous research that found that training improved the accuracy of perceptual voice evaluations. However, the hypothesis that imitation could improve perceptual ratings was not supported by this study and bears further investigation due to the small sample size.
|
Page generated in 0.0197 seconds