Children with dysarthria due to cerebral palsy (CP) often face barriers to accessing speech research participation and clinical speech services. Utilizing at-home online videoconferencing may be a practical solution to these accessibility barriers if the speech signal yielded from online recordings is valid. This study aimed to determine the validity of acoustic and perceptual measures obtained from speech data collected (remotely) online from children with dysarthria due to CP. The speech of 17 children with dysarthria was recorded by means of two data collection methods performed simultaneously: 1) via Zoom video communications and 2) via a professional audio recording device sent to the children's parents.
A calibration procedure permitted the children’s original vocal sound pressure level (SPL) to be represented in the speech signal. Acoustic and perceptual measures extracted from the two recordings were compared in order to determine the validity of speech data collected online from the children. The acoustic measures, obtained from 1,690 tokens of words and 605 tokens of sentences, were the second formant (F2) range of diphthongs, F2 slope of diphthongs, fricative-affricate duration difference, word duration/articulation rate, mean fundamental frequency (F0), F0 variation, SPL, shimmer, signal-to-noise ratio (SNR), and cepstral peak prominence (CPP).
Perceptual measures were 187 adult listeners’ orthographic transcription accuracy and visual analog scale (VAS) ratings of the children’s speech, collected via an online crowdsourced platform. Acoustic measures of F2 range of diphthongs, fricative-affricate duration difference, word duration, and mean F0 reached the validity criterion of rrm-value .75 and demonstrated good agreement within the predetermined clinical criterion at both word and sentence levels. Moreover, SPL met the validity criterion and exhibited good agreement at the word level; however, it failed to meet the validity criterion and demonstrated agreement outside the clinical criterion at the sentence level.
The F2 slope of diphthongs showed a strong correlation between online and audio-device recordings and reached the validity criterion; however, it did not show agreement within the clinical criterion at either word or sentence level. Perturbation-based, noise-based, and cepstral measures (i.e., F0 variation, shimmer, SNR, CPP) showed a wide range of correlation and agreement outside of clinical criteria between online and audio-device recordings. Both perceptual measures showed strong correlations between the two recording methods, reaching the validity criterion. Findings suggest that measures that reflect physiological aspects of speech production may be valid and appropriate to extract from online recordings.
However, measures capturing noise and variability within the signal may not be valid when obtained from online recordings. Additionally, the results suggest that perceptual measures of listeners’ transcription and ratings from online recordings may be valid to use for research and clinical purposes. Therefore, careful consideration of the appropriate measures and their limitations is essential to obtaining accurate results when extracting measures from online recordings. These findings provide a valuable foundation of evidence supporting the use of online videoconferencing platforms for several acoustic and perceptual measures commonly implemented in speech research, clinical assessment, and treatment.
Identifer | oai:union.ndltd.org:columbia.edu/oai:academiccommons.columbia.edu:10.7916/t882-8s54 |
Date | January 2023 |
Creators | Hwang, Kyung Hae |
Source Sets | Columbia University |
Language | English |
Detected Language | English |
Type | Theses |
Page generated in 0.0019 seconds