• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 4
  • 2
  • 1
  • 1
  • Tagged with
  • 12
  • 12
  • 5
  • 3
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Perceptual, Acoustic, and Kinematic Measures of Speech Precision and Steadiness

Martin, Jessica Jamiel 04 June 2024 (has links) (PDF)
Clinicians rely on perceptual analysis in the assessment and diagnosis of motor speech disorders. However, connecting perceptual measures to quantitative data has proved challenging. This study uses correlational analyses to explore the relationship between perceptual, acoustic, and kinematic measures. Twenty typical speakers provided speech samples of rapid syllable repetition and speech tasks, which were then rated by 12 listeners for precision and steadiness on a visual analog scale. Data was analyzed to identify significant correlations between the measures. We found evidence of a modest perceptual-acoustic relationship, with results suggesting that acoustic rate may be correlated with perceptual features. Our findings also suggest a significant perceptual-kinematic relationship, as several kinematic measures of displacement demonstrated significant correlations with precision and steadiness ratings. We found that speakers with more consistent speech movements received higher steadiness ratings, and speakers with faster articulatory movements were rated as more precise. This study supports the use of perceptual analysis in clinical practice and points towards establishing connections between perceptual, acoustic, and kinematic measures used in speech analysis.
2

Visualising articulation : real-time ultrasound visual biofeedback and visual articulatory models and their use in treating speech sound disorders associated with submucous cleft palate

Roxburgh, Zoe January 2018 (has links)
Background: Ultrasound Tongue Imaging (UTI) is growing increasingly popular for assessing and treating Speech Sound Disorders (SSDs) and has more recently been used to qualitatively investigate compensatory articulations in speakers with cleft palate (CP). However, its therapeutic application for speakers with CP remains to be tested. A different set of developments, Visual Articulatory Models (VAMs), provide an offline dynamic model with context for lingual patterns. However, unlike UTI, they do not provide real-time biofeedback. Commercially available VAMs, such as Speech Trainer 3D, are available on iDevices, yet their clinical application remains to be tested. Aims: This thesis aims to test the diagnostic use of ultrasound, and investigate the effectiveness of both UTI and VAMs for the treatment of SSDs associated with submucous cleft palate (SMCP). Method: Using a single-subject multiple baseline design, two males with repaired SMCP, Andrew (aged 9;2) and Craig (aged 6;2), received six assessment sessions and two blocks of therapy, following a motor-based therapy approach, using VAMs and UTI. Three methods were used to measure therapy outcomes. Firstly, percent target consonant correct scores, derived from phonetic transcriptions provide outcomes comparable to those used in typical practice. Secondly, a multiplephonetically trained listener perceptual evaluation, using a two-alternative multiple forced choice design, to measure listener agreement provides a more objective measure. Thirdly, articulatory analysis, using qualitative and quantitative measures provides an additional perspective able to reveal covert errors. Results and Conclusions: There was overall improvement in the speech for both speakers, with a greater rate of change in therapy block one (VAMs) and listener agreement in the perceptual evaluation. Articulatory analysis supplemented phonetic transcriptions and detected covert articulations and covert contrast as well as supporting the improvements in auditory outcome scores. Both VAMs and UTI show promise as a clinical tool for the treatment of SSDs associated with CP.
3

Evaluation of nasal speech : a study of assessments by speech-language pathologists, untrained listeners and nasometry

Brunnegård, Karin January 2008 (has links)
Excessive nasal resonance in speech (hypernasality) is a disorder which may have negative communicative and social consequences for the speaker. Excessive nasal resonance is often associated with cleft lip and palate, velopharyngeal impairment, dysarthria or hearing impairment. Evaluation of hypernasality has proved to be a challenge in the clinic and in research. There are questions regarding the accuracy and reliability of auditory perceptual evaluations of nasal speech, and whether instrumental measures can be used to improve the reliability of clinical evaluation. There is also the question of whether clinical evaluation reflects the impact of hypernasality in a speaker’s everyday life. The purpose of this thesis was to evaluate the extent of reliability problems connected with auditory perceptual assessment of nasality in speech, to explore whether they might interfere with treatment decisions or have an impact in the everyday life of patients, and whether they can be effectively diminished by the use of nasometry. Speakers with cleft lip and palate or velopharyngeal impairment formed the basis of the clinical population used in this study. Speech samples from 52 of these speakers, along with samples from a reference population of 21 speakers who did not have cleft palate, velopharyngeal impairment or speech disorders were used in perceptual evaluation tasks. Fourteen speakers from the clinical population and 11 from the reference population also underwent nasometric evaluation. A further reference population of 220 children from three Swedish cities, whose ages were consistent with those used for clinical checks of children born with cleft palate were assessed with nasometry to establish normative data for the Nasometer™. Perceptual speech assessments were conducted on hyper- and hyponasality, as well as audible nasal air emission and/or nasal turbulence, using 5-point ordinal scales. Listeners were SLPs experienced in the evaluation of cleft palate speech, non-expert SLPs and untrained listeners. Listening assessments were performed from audio recorded speech samples assembled in random order. Nasometry measures were made on three speech passages each with specific phonetic content, using the Nasometer™, model II. Perceptual evaluation Results showed that for hypernasality assessment, 15% of hypernasality assessments had disagreements between expert SLPs that were potentially important for clinical decisions, as did 6% of assessments for audible nasal air emission and/or nasal turbulence. For nasality problems, a comparison of expert and untrained listeners showed that they generally agreed on which speakers were hypernasal and on the ranking of nasal speakers. All speakers that had been rated with moderate to severe hypernasality by expert listeners were considered by the untrained listeners as having a serious enough speech disorder to call for intervention. However, in the case of audible nasal air emission and/or nasal turbulence the expert listeners were more prone to notice this feature than the untrained listeners. Instrumental evaluation The development of normative values for the three Swedish passages for the NasometerTM (comparable to normative values in other languages) has provided a basis for use of instrumental measures in Swedish clinics, oral sentences mixed sentences nasal sentences. The measures showed no significant differences due to city, gender or age within an age range of 4-10 years. When nasometry measures were compared with perceptual evaluation of speech samples from the same speakers, all correlations were moderate to good for expert SLPs and non-expert SLPs. The difference between correlations was significantly higher for expert SLPs than for untrained listeners. Reliability figures for perceptual assessments for expert SLP listeners indicated that there were some cases where lack of reliability could affect clinical decision making. However, in the main, judgements of nasality problems made by clinicians had everyday validity. They reflected the impressions of the everyday listener, especially in regard to the need for intervention. The study also indicates that now that Swedish norms are available, the Nasometer™ might be useful as a complement to auditory perceptual clinical speech assessments in Swedish cleft palate clinics in order to improve reliability of clinical assessment.
4

Deep upscaling for video streaming : a case evaluation at SVT.

Lundkvist, Fredrik January 2021 (has links)
While digital displays have continuously increased in resolution, video content produced before these improvements is however stuck at its original resolution, and the use of some form of scaling is needed for a satisfactory viewing experience on high-resolution displays. In recent years, the field of video scaling has taken a leap forward in output quality, due to the adoption of deep learning methods in research. In this paper, we describe a study wherein we train a convolutional neural network for super-resolution, and conduct a large-scale A/B video quality test in order to investigate if SVT video-ondemand viewers prefer video upscaled using a convolutional neural network to video upscaled using the standard bicubic method. Our results show that viewers generally prefer CNNscaled video, but not necessarily for the types of content this technology would primarily be used to scale. We conclude that the technology of deep upscaling shows promise, but also believe that more optimization and flexibility is need for deep scaling to be viable for mainstream use. / Allteftersom bildskärmstekniken förbättras så får mediekonsumenter tillgång till skärmar med allt högre upplösningar; dock är videomaterial som producerats för en viss bildupplösning, fast på denna nivå, och någon form av skalning måste användas för en bra tittarupplevelse på högupplösta skärmar. På senare tid så har videoskalning förändrats, tack vare användandet av djupinlärningsmetoder inom forskningen. I den här rapporten beskriver vi en studie där vi tränade en djup modell för videouppskalning, och sedan utförde ett storskaligt A/B-test, med syftet att undersöka huruvida SVTs onlinetittare föredrar video skalad med djupinlärning över video skalad med konventionella metoder. Våra resultat visar att tittarna föredrog video skalad med djupinlärning, dock inte nödvändigtvis för det material tekniken främst skulle användas med. Vi drar slutsatsen att videoskalning med hjälp av djupinlärning är lovande, men anser också att mer optimering och flexibilitet behövs innan tekniken kan anses mogen för bred adoption.
5

Training Auditory-Perceptual Voice Ratings Over Time: Effects on Rater Confidence

Collins, Nicole Lynn 23 April 2021 (has links)
No description available.
6

Speech Adaptation to Kinematic Recording Sensors

Hunter, Elise Hansen 01 March 2016 (has links) (PDF)
This thesis examined the time course of speech adaptation prior to data collection when using an electromagnetic articulograph to measure speech articulator movements. The stimulus sentence and electromagnetic sensor placement were designed to be sensitive to changes in the fricatives /s/ and /ʃ/. Twenty native English speakers read aloud stimulus sentences before the attachment of six electromagnetic sensors, immediately after attachment, and again at 5, 10, 15 and 20 minutes after attachment. Participants read aloud continuously between recordings to encourage adaptation to the presence of the sensors. Audio recordings were rated by 20 native English listeners who were not part of the production study. After listening to five practice samples, these participants rated 150 stimuli (31 repeat samples) using a visual analog scale (VAS) with the endpoints labeled as precise and imprecise. An acoustic analysis of the recordings was done by segmenting the fricatives /s/ and /ʃ/ from the longer recording and computing spectral center of gravity and spectral standard deviation in Hertz. Durations of /s/, /ʃ/ and the sentence were also measured. Results of both perceptual and acoustic analysis revealed a change in speech precision over time, with all post attachment recordings receiving lower perceptual scores. Precision ratings beyond the ten minute recording remained steady. It can be concluded from the results that participants reached a height of adaptation after 10 minutes of talking with kinematic recording sensors attached, and that after the attachment of sensors, speech production precision did not at any point return to pre attachment levels.
7

Synthesis of facial ageing transforms using three-dimensional morphable models

Hunter, David W. January 2009 (has links)
The ability to synthesise the effects of ageing in human faces has numerous uses from aiding the search for missing people to improving recognition algorithms and aiding surgical planning. The principal contribution of this thesis is a novel method for synthesising the visual effects of facial ageing using a training set of three-dimensional scans to train a statistical ageing model. This data-base is constructed by fitting a statistical Face Model known as a Morphable Model to a set of two dimensional photographs of a set of subjects at different age points in their lives. We verify the effectiveness of this algorithm with both quantitative and psychological evaluation. Most ageing research has concentrated on building models using two-dimensional images. This has two major shortcomings, firstly some of the information related to shape change may be lost by the projection to two-dimensions; secondly the algorithms are very sensitive to even slight variations in pose and lighting. By using standard face-fitting methods to fit a statistical face model to the image we overcome these problems by reconstructing the lost shape information, and can use a model of physical rotations and light transfer to overcome the issues of pose and rotation. We show that the three-dimensional models captured by face-fitting offer an effective method of synthesising facial ageing. The second contribution is a new algorithm for ageing a face model based on Projection to Latent Structures also known as Partial Least Squares. This method attempts to separate the training set into a set of basis vectors that best explains the shape and colour changes related to ageing from those factors within the training set that are unrelated to ageing. We show that this method is more accurate than other linear techniques at producing a face model that resembles the individual at the target age and of producing a face image of the correct perceived age. The third contribution is a careful evaluation of three well known ageing methods. We use both quantitative evaluation to determine the accuracy of the ageing method, and perceptual evaluation to determine how well the model performs in terms of perceived age increase and also identity retention. We show that linear methods more accurately capture ageing and identity information if they are trained using an individualised model, and that ageing is more accurately captured if PLS is used to train the model.
8

Logopeders bedömarreliabilitet vid perceptuell röstanalys av utvalda röstexempel : en början till ett referensröstmaterial / The reliability of speech and language pathologists' perceptual evaluations of selected voice samples

Asaid, Dina, Erenmalm, Sofia January 2012 (has links)
Vid användning av audio-perceptuell röstanalys för framtagning av referensröster är begreppet reliabilitet av central betydelse. Syftet med denna uppsats var att undersöka reliabiliteten mellan erfarna röstlogopeders perceptuella röstanalys av ett antal utvalda röstexempel. Förhoppningen var att utifrån detta kunna sammanställa en början till ett referensröstmaterial bestående av manliga och kvinnliga referensröster representativa för olika parametrar i SVEA-protokollet. De specifika frågeställningarna var: Hur samstämmiga i perceptuell röstanalys är bedömarna kring de valda röstexemplens olika parametrar? Är någon eller några av de parametrar som bedömarna är överens om extra framträdande i någon röst så att denna röst kan användas som referensröst? Utifrån en databas med 65 röstinspelningar valdes 15 röstexempel ut av författarna att skattas av sju erfarna logopeder med SVEA-protokollet. En andra bedömningsomgång genomfördes med tre röstexempel slumpvis utvalda från de 15 röstexemplen i den första bedömningsomgången. Statistiska analyser av logopedernas inter- och intrabedömarreliabilitet gjordes både på alla röstexempel och på samtliga kvalitetsparametrar. Bedömarnas skattningar uppvisade mycket stor spridning i flera röstexempel, vilket inverkade på korrelationernas utfall och kan vid en första anblick ge ett missvisande resultat. En djupare analys av bedömarnas skattningar av enskilda röstparametrar visade på betydligt högre samstämmighet. Utifrån detta resultat tog författarna fram tre potentiella referensröster. Flera av de övriga 12 röstexemplen hade relativt hög interbedömarreliabilitet men då skattningsvärdena var så pass låga för dessa röster valdes de inte ut som referensröster. Trots låga skattningsvärden skulle dessa röstexempel kunna användas som referensröster för att exemplifiera lägre grader av avvikelser. Slutsatsen är att det finns skillnader i hur bedömarna skattat röstexemplen i denna studie och reliabiliteten mellan bedömarna skiftar. Författarna drar även slutsatsen att det är motiverat att fortsätta leta och analysera röstexempel för att få en heltäckande uppsättning referensröster. Metodvalet i denna studie anses vara en framkomlig väg för att fortsätta forma detta referensröstmaterial. / Interrater and intrarater reliability are of great importance in the selection of reference voice examples. The purpose of this study is to investigate the reliability of experienced speech and language pathologists’ evaluations of selected voice samples. The aim is to begin a collection of male and female reference voice examples which represent different voice quality parameters according to the Stockholm Voice Evaluation Approach (SVEA). The specific questions are: How well do speech and language pathologists agree when rating voices along different voice quality parameters? Are any of the voice quality parameters in the speech samples prominent enough to be qualified as reference voice examples? The authors selected 15 voice samples out of a database consisting of 65 voice samples. The voices were evaluated by seven experienced speech and language pathologists using the SVEA protocol. The results were statistically analyzed to study interrater reliability. In order to investigate intrarater reliability a second evaluation session was carried out in which the speech and language pathologists evaluated three voice samples randomly selected from the 15 samples used in the first evaluation session. The results showed a wide range in the raters’ evaluations, which had an impact on the correlations. However, a closer look at separate parameters indicated considerably higher similarity in the ratings. Based on these results three reference voice examples were selected. Even though high correlation values were found in several of the other twelve voice samples, the ratings in these were not high enough to qualify them as reference voice examples in this study. Nevertheless, these voices can still be used to exemplify various degrees of deviation. The conclusions are that there is a great variation regarding reliability between and within raters and also regarding how the different speech and language pathologists rate the voices. The authors also conclude that the search for clear reference voice examples is highly motivated and ought to be continued, preferably with the method used in this study.
9

DIGITAL INPAINTING ALGORITHMS AND EVALUATION

Mahalingam, Vijay Venkatesh 01 January 2010 (has links)
Digital inpainting is the technique of filling in the missing regions of an image or a video using information from surrounding area. This technique has found widespread use in applications such as restoration, error recovery, multimedia editing, and video privacy protection. This dissertation addresses three significant challenges associated with the existing and emerging inpainting algorithms and applications. The three key areas of impact are 1) Structure completion for image inpainting algorithms, 2) Fast and efficient object based video inpainting framework and 3) Perceptual evaluation of large area image inpainting algorithms. One of the main approach of existing image inpainting algorithms in completing the missing information is to follow a two stage process. A structure completion step, to complete the boundaries of regions in the hole area, followed by texture completion process using advanced texture synthesis methods. While the texture synthesis stage is important, it can be argued that structure completion aspect is a vital component in improving the perceptual image inpainting quality. To this end, we introduce a global structure completion algorithm for completion of missing boundaries using symmetry as the key feature. While existing methods for symmetry completion require a-priori information, our method takes a non-parametric approach by utilizing the invariant nature of curvature to complete missing boundaries. Turning our attention from image to video inpainting, we readily observe that existing video inpainting techniques have evolved as an extension of image inpainting techniques. As a result, they suffer from various shortcoming including, among others, inability to handle large missing spatio-temporal regions, significantly slow execution time making it impractical for interactive use and presence of temporal and spatial artifacts. To address these major challenges, we propose a fundamentally different method based on object based framework for improving the performance of video inpainting algorithms. We introduce a modular inpainting scheme in which we first segment the video into constituent objects by using acquired background models followed by inpainting of static background regions and dynamic foreground regions. For static background region inpainting, we use a simple background replacement and occasional image inpainting. To inpaint dynamic moving foreground regions, we introduce a novel sliding-window based dissimilarity measure in a dynamic programming framework. This technique can effectively inpaint large regions of occlusions, inpaint objects that are completely missing for several frames, change in size and pose and has minimal blurring and motion artifacts. Finally we direct our focus on experimental studies related to perceptual quality evaluation of large area image inpainting algorithms. The perceptual quality of large area inpainting technique is inherently a subjective process and yet no previous research has been carried out by taking the subjective nature of the Human Visual System (HVS). We perform subjective experiments using eye-tracking device involving 24 subjects to analyze the effect of inpainting on human gaze. We experimentally show that the presence of inpainting artifacts directly impacts the gaze of an unbiased observer and this in effect has a direct bearing on the subjective rating of the observer. Specifically, we show that the gaze energy in the hole regions of an inpainted image show marked deviations from normal behavior when the inpainting artifacts are readily apparent.
10

Acoustique des salles dans les lieux d'écoute de la musique : analyse perceptive et acoustique dans les contextes réels et virtuels. / Acoustics of auditoria designed for listening to music : perceptual and acoustical analysis in real and virtual contexts

Espitia Hurtado, Juan Pablo 02 February 2016 (has links)
L'objectif général de cette thèse est de contribuer à l'exploration de la qualité sonore des salles de concert à partir d'une approche centrée sur l'identification de l'expérience sensible des mélomanes. Nous montrons d'abord les limites de l'approche traditionnelle de l'évaluation perceptive, principalement centrée sur des attributs reliés à la salle ou à la musique, et définis a priori à partir des connaissances des expérimentateurs, le plus souvent acousticiens, dans les termes de leur domaine scientifique ou avec des mots dont ils pensent partager avec les sujets les mêmes significations. Puis, reprenant les trois méthodes d'exploration de l'expérience subjective -enquêtes "hors situation d'écoute" (basées sur la mémoire), questionnaires dans les salles lors de concerts, et tests d'écoute en laboratoire-, nous les avons implémentées, mais dans un cadre théorique et méthodologique explicitement positionné en psychologie et en linguistique sur l'étude du sensible comme objet psychologique autonome. De plus, pour notre approche expérimentale en laboratoire, nous avons implémenté un système de décodage paramétrique basé sur le système SIRR (spatial impulse response rendering), permettant la reproduction de champs sonores à partir de réponses impulsionnelles des salles mesurées au format Ambisonics de premier ordre. La contribution de ce travail consiste à repérer les facteurs psychologiques reliés à l'écoute d'un concert dans une salle à partir de l'objectivation de l'expérience subjective des mélomanes et de leur évaluation de la qualité acoustique de la salle, et à partir de là, établir des relations avec les mesures acoustiques des salles étudiées. / The general purpose of this thesis is to explore the sound quality of concert halls by approaching it through the listeners’ sensory experience. We first show the limitations of the traditional approach to perceptual evaluation, principally centred on attributes related to hall or music, from the knowledge of experimenters, usually acousticians, and therefore pre-defined by their scientific field, or in common language they believe is shared by the subjects in terms of use and equivalence of meaning. Then, re-using the three methods for exploring subjective experience –extra-auditory enquiries “from memory”, questionnaire surveys within auditoria for given concerts, and laboratory listening tests–, we have implemented them, but within a theoretical and methodological framework, explicitly rooted in psychology and linguistics, and respecting sensibility as an autonomous psychological object of study. Furthermore, in our experimental laboratory approach, we have implemented a parametric decoding system based on SIRR (spatial impulse response rendering), and permitting the reproduction of sound fields from first-order Ambisonics room impulse responses. The contribution of this work thus consists in identifying the psychological factors related to concert listening in a hall by objectivising the subjective experience of listeners (music-lovers) and their evaluation of the acoustic qualities of an auditorium; and in establishing relationships between the psychological factors and the acoustic measurements in the studied auditoria.

Page generated in 0.1229 seconds