Global ETD Search

11	Análise de sinais de voz por padrões visuais de dinâmica vocal / Voice signal analysis using vocal dynamic visual patterns Maria Eugenia Dajer 30 July 2010 (has links) O objetivo deste trabalho foi avaliar vozes saudáveis e com alterações patológicas aplicando análise de Padrões Visuais de Dinâmica Vocal (PVDV) em conjunto com análise acústica e análise perceptivo-auditiva. Foram avaliadas 91 vozes da vogal sustentada /a/ do português do Brasil, de sujeitos de ambos os gêneros com idades entre 21 e 88 anos. As vozes foram gravadas com taxa de amostragem de 22.050 Hz, 16 bits, mono canal e formato WAV. Foram obtidos valores de jitter, shimmer e freqüência fundamental. Para análise perceptivo-auditiva foram avaliadas rugosidade, soprosidade, tensão e instabilidade. Para descrever a dinâmica dos sinais de voz dos PVDV foi utilizada a técnica de reconstrução de espaço de fase e foram analisados qualitativamente os parâmetros de loops, regularidade e convergência de traçados. Foram aplicados testes estatísticos paramétricos e não paramétricos. Os resultados demonstram que jitter apresenta uma correlação negativa com loop, regularidade e convergência dos traçados; e que shimmer tem correlação negativa com convergência e loops. As características de rugosidade e soprosidade estão correlacionadas negativamente com os três parâmetros dinâmicos. A análise qualitativa dos PVDV é uma técnica promissora, por considerar os componentes caótico e determinístico da voz. É sugerido que não substitui as técnicas existentes, embora possa aperfeiçoar e complementar os métodos usados por profissionais fonoaudiólogos e otorrinolaringologistas. / The aim of this research was to analyze healthy and pathologic voices using Vocal Dynamic Visual Patterns (VDVP) in combination with acoustical and perceptual analysis. Ninety one voice signals of sustained vowel /a/ from Brazilian Portuguese, from male and female patients, were analyzed using acoustical analysis, perceptual analysis and Vocal Dynamic Visual Patterns (VDVP) analysis. All voice samples were quantized in amplitude with 16 bits and recorded in mono-channel WAV format. The sampling frequency was 22050 Hz. Acoustical values for jitter, shimmer and fundamental frequency were obtained. Roughness, breathiness, strain and irregularity were analyzed for perceptual analysis. Phase space reconstruction technique was performed in order to describe the voice signal nonlinear characteristics by Vocal Dynamic Visual Patterns. Results showed negative correlation for jitter and the 3 dynamic parameters, as well as, for shimmer with convergence and loops. Roughness and breathiness were negative correlated with dynamical parameters. Vocal Dynamic Visual Pattern analysis is a promising technique for voice evaluation; including voice chaotic and deterministic components. It is suggested that visual pattern analysis do not replace the existing voice analysis techniques, although it complements and improves the voice evaluation methods available for speech therapist and laryngologists. Análise de voz Análise não linear Padrão visual de dinâmica vocal Reconstrução de espaço de fase Voz Nonlinear analysis Phase space reconstruction Vocal dynamic visual pattern Voice Voice analysis
12	Computer methods for voice analysis Granqvist, Svante January 2003 (has links) This thesis consists of five articles and a summary. Thethesis deals with methods for measuring properties of thevoice. The methods are all computer-based, but utilisedifferent approaches for measuring different aspects of thevoice. Paper I introduces the Visual Sort and Rate (VSR) method forperceptual rating of voice quality. The method is based on theVisual Analogue Scale (VAS), but simultaneously shows allstimuli as icons along the VAS on the computer screen. As thelistener places similar-sounding stimuli close to each otherduring the rating process, comparing stimuli becomeseasier. Paper II introduces the correlogram. Fundamental frequencyF0 sometimes cannot be strictly defined, particularly forperturbed voice signals. The method displays multipleconsecutive correlation functions in a grey scale image. Thus,the correlogram avoids selecting a single F0 value. Rather itpresents an unbiased image of periodicity, allowing theinvestigator to select among several candidates, ifappropriate. PaperIII introduces a method for detection of phonation tobe utilised in voice accumulators. The method uses twomicrophones attached near the subjects ears. Phase andamplitude relations of the microphone signals are used to forma phonation detector. The output of the method can be used tomeasure phonation time, speaking time and fundamental frequencyof the subject, as well as sound pressure level of both thesubjects voicing and the ambient sounds. Paper IV introduces a method for Fourier analysis ofhigh-speed laryngoscopic imaging. The data from the consecutiveimages are re-arranged to form time-series that reflect thetime-variation of light intensity in each pixel. Each of thesetime series is then analysed by means of Fouriertransformation, such that a spectrum for each pixel isobtained. Several ways of displaying these spectra aredemonstrated. Paper V examines a test set-up for simultaneous recording ofairflow, intra-oral pressure, electro-glottography, audio andhigh-speed imaging. Data are analysed with particular focus onsynchronisation between glottal area and inverse filteredairflow. Several methodological aspects are also examined, suchas the difficulties in synchronising high-speed imaging datawith the other signals. / QC 20100609 voice analysis perceptual analysis fundamental frequency correlogram aperiodicity Fourier analysis high-speed imaging laryngoscopy vocal fold vibration voice accumulation. TECHNOLOGY TEKNIKVETENSKAP
13	Vuxna med enkelsidig genomgående läpp-, käk- ochgomspalt : Perceptuell röstbedömning med Stockholm Voice EvaluationApproach (SVEA) Isaksson, Kristoffer January 2012 (has links) Läpp-, käk och gomspalt (LKG) kan medföra svårigheter med tal, artikulation och röst. Enligt tidigare forskning har vuxna individer behandlade för LKG liknande förekomst av röstavvikelser som kontrollpersoner utan spalt. Syftet med denna studie var att undersöka grad av röstavvikelser hos vuxna patienter behandlade för enkelsidig LKG och kartlägga eventuella samband mellan röstavvikelser och kirurgiska metoder för gomslutning, kön och operation med eller utan svalglambå samt jämföra röstresultat med data från individer utan LKG. Sjuttio patienter behandlade för enkelsidig genomgående LKG deltog i studien, varav 45 patienter hade opererats med gomslutning i en seans och 25 patienter hade opererats i två seanser. Elva av patienterna i patientgruppen hade genomgått svalglambåoperation. I studien ingick även en åldersmatchad kontrollgrupp med 63 individer utan LKG. Röstinspelningar utvärderades med perceptuell röstbedömning med Stockholm Voice Evaluation Approach (SVEA) av två erfarna logopeder. Skattning skedde individuellt av randomiserade och blindade inspelningar. Inter- och intrabedömarreliabilitet beräknades. Grad av avvikelse i röstkvalité var ca. 5/100 i patientgrupp och kontrollgrupp vilket är lägre jämfört med tidigare studier. “Knarr” fanns lägre i patientgruppen i jämförelse med kontrollgrupp. Skattningar av övriga röstparametrar skiljde sig inte mellan patientgrupp och kontrollgrupp. Patienter opererade med gomslutning i en eller två seanser hade inga skillnader i röstparametrar. Enstaka små skillnader fanns mellan könen. Ingen skillnad fanns mellan patienter som genomgått svalglambåoperation och de som inte gjort det. / Cleft lip and palate (CLP) may cause impairments in speech, articulation and voice. Treatment of patients with CLP include different types of palatoplasty. Recent studies have found a comparable prevalence rate of dysphonia in adult patients treated for CLP and controls without cleft. The purpose of this study was to investigate the rate of dysphonic deviation in adult patients treated for unilateral CLP and determine, if any, the relationship between voice quality outcome and differences in surgical procedures, gender, palatoplasty with or without pharyngeal flap surgery and compare the voice characteristics with age related normative data. Seventy patients treated for unilateral CLP with one- or two-stage palatoplasty participated in this study, of which 45 patientshad undergone one-stage palatoplasty and 25 patients two-stage palatoplasty. Eleven ofthe patients had also undergone pharyngeal flap surgery. Data from an age matched non-cleft group consisted of 63 participants. Sound recordings of participants' voices were perceptually assessed by two speech-language pathologists. Individual ratings of randomized and blinded sound recordings were performed. Inter- and intra-raterreliability was calculated. The study showed a lower degree of dysphonia in patients treated for unilateral CLP than earlier studies. Among patients treated for unilateral CLP and the non-cleft group, a degree of approximately 5/100 voice quality deviation was found. ”Vocal fry” was found significantly lower in patients treated for CLP than the non-cleft group. Measures of voice parameters did not differ significantly between patients and non-cleft groups. Patients that had undergone palatoplasty in one or two stages showed no significant differences in voice parameters. Minor statistical significance was found in a few of the measured voice parameters as related to gender. Between patients that had undergone pharyngeal flap surgery and those that had not no differences were found. Unilateral cleft palate Perceptual voice analysis Voice quality One-stage/two-stage palatoplasty Enkelsidig LKG Perceptuell röstanalys Röstkvalité Gomplastik i en/två seanser
14	Talhastighet indikerar korrekthet i vittnesmål / Speech rate indicates accuracy of testimonies Elvesund, Alexandra, Harrysson, Elin January 2021 (has links) Eyewitnesses are important for the outcome of many criminal cases. Sometimes other evidence is lacking and the testimony then becomes decisive, but it is difficult to know how reliable the eyewitness’ memories are. This study aimed to investigate speech rate as an indicator of memory accuracy. The hypothesis was that correct memories would be presented with a faster speech rate than incorrect memories. The participants (n = 10) were students who were part of a previous study by Lindholm et al. (2018). The students watched a film sequence and were interviewed afterwards about what they had seen. Their answers were recorded and the voice was analyzed digitally. A t-test for dependent samples showed that correct statements had significantly faster speech rate than incorrect ones. The results support the hypothesis and are in line with previous research on making asserted statements faster. Should future research show similar results, the method of digital voice analysis could be used as a complement in the judiciary to evaluate testimony. Understanding what acoustic indicators are on correct or incorrect memories could be helpful in assessing testimonies for judges, jurors, legal representatives and the police force. / Ögonvittnen är viktiga för utgången i många brottmål. Ibland saknas övrig bevisning och vittnesmålet blir då avgörande men det är svårt att veta hur pålitliga ögonvittnens minnen är. Denna studie syftade till att undersöka talhastighet som en indikator på korrekthet i minnen genom användning av digital röstanalys. Hypotesen var att talhastigheten vid korrekta minnen skulle vara snabbare än vid inkorrekta. Deltagarna (n = 10) var studenter som ingick i en tidigare experimentell studie av Lindholm et al. (2018). Studenterna fick se en film och intervjuades efteråt om vad de sett. Deras svar spelades in och rösten analyserades digitalt. Ett t-test för beroende mätningar visade att talhastigheten vid korrekta påståenden var signifikant snabbare än vid inkorrekta. Resultatet stödjer hypotesen samt ligger i linje med tidigare forskning kring att säkra påståenden sker snabbare. Skulle framtida forskning visa på liknande resultat skulle metoden med digital röstanalys kunna användas som ett komplement inom rättsväsendet som stöd vid vittnesmål. Förståelse för vilka akustiska indikatorer som finns på korrekta eller inkorrekta minnen skulle kunna vara hjälpsamma i vittnesförhör, både hos domare, nämndemän, juridiska ombud och hos polisen. memory witness assessments eyewitness accuracy speech rate voice analysis minne vittnesbedömningar korrekthet i vittnesutsagor talhastighet röstanalys
15	Zur Wertigkeit videostroboskopischer und lupenlaryngoskopischer Tonaufnahmen für die objektive Stimmanalyse / The significance of videostroboscopic and magnifying laryngoscopic voice recordings for the objective voice analysis Lemm, Leonie 02 July 2013 (has links) Die objektive Stimmanalyse ist für die tägliche phoniatrische Praxis von grundlegender Bedeutung bezüglich der Diagnostik und Therapie von Stimmstörungen. Als Goldstandard gilt das Göttinger Heiserkeits-Diagramm (GHD), welches die Aufzeichnung von 28 Vokalen durch geschultes Fachpersonal mit einem zeitlichen Aufwand von ca. 15 Minuten pro Patient erfordert. In der vorliegenden Studie wurde untersucht, ob das GHD auch dann valide Ergebnisse für die Stimmqualität liefert, wenn statt des Standardprotokolls gehaltene Phonationen aus indirekter Laryngoskopie oder Videostroboskopie analysiert werden (sog. „reduziertes Protokoll“). Wäre dies der Fall, ließe sich Stimmanalyse und Untersuchung des Larynx in einem Arbeitsschritt durchführen und somit der zeitliche und personelle Aufwand deutlich reduzieren. Es wurden Stimmaufnahmen aus Stroboskopie und Laryngoskopie von 213 Patienten (97 männlich, 116 weiblich) mit Hilfe des GHD analysiert. Am gleichen Untersuchungstag erfolgte zudem eine typische Mikrophonaufnahme gehaltener Phonationen zur Analyse nach dem vollständigen GHD-Protokoll. Die aus reduziertem und vollständigem Protokoll ermittelten Werte für die Irregularität und die Rauschkomponente des Stimmsignals als objektive Marker der Stimmqualität wurden jeweils korreliert. Sowohl für die Irregularitätskomponente (r=0,65) als auch für die Rauschkomponente (r=0,55) ergaben sich signifikante Korrelationen (p<0,001) zwischen beiden Verfahren. Außerdem zeigte sich, dass bereits eine einzige Stimmgebung aus Laryngoskopie und Stroboskopie ein zuverlässiges Ergebnis liefert. Es konnte eine Mindesttonhaltedauer von 1 Sekunde ermittelt werden. Die Vereinfachung des Vokals während Laryngoskopie beeinflusst das Ergebnis nicht und beide Methoden eignen sich zur klinischen Verlaufskontrolle. 610 Laryngoskopie Lupenlaryngoskopie Stroboskopie Heiserkeit Irregularität Rauschen GHD Objektive Stimmanalyse Göttinger Heiserkeits-Diagramm laryngoscopy stroboscopy hoarseness Göttingen hoarseness diagram irregularity noise GHD objective voice analysis Medizin (PPN619874732) Oto-Rhino-Laryngologie (PPN619876220) GOK-MEDIZIN
16	Prediktion av användaromdömen om språkcafé-samtal baserat på automatisk röstanalys / Prediction of user ratings of language cafe conversations based on automatic voice analysis Hansson Svan, Angus, Mannerstråle, Carl January 2019 (has links) Spoken communication between humans generate information in two channels; the primary channel, linked to the syntactic-semantic part of the speech (what a person is litteraly saying); the secondary channel conveys paralinguistic information (tone, emotional state and gestures). This study examines the paralinguistic part of the speech, more specific the tone and emotional state. The study examines if there is a correlation between human speech and the opinion of a participant to a language café based conversation. The language café conversations is moderated by the social robot platform Furhat created by Furhat Robotics. The report is written from two perspectives. A data scientific view where identified emotions in audio files are analysed with machine learning algorithms and mathematical models. Vokaturi, an emotion recognition software, analyses the audio files and quantifies the emotional attributes. The classification model is based upon these attributes and the answers from the language café survey. Speech emotion recognition is also evaluated as a method for gathering customer opinions in a customer feedback loop. The results show an accuracy of 61% and indicates that some sort of prediction is possible. However there is no clear correlation between the recorded human voice and the participants opinion of the conversation. In the discussion part the difficulties of creating a high accuracy model with current data is analysed. It also contains a hypothetic analysis of the model as a gathering method for customer data. / En person som talar sprider information genom en primär samt en sekundär kanal. Den primära kanalen är kopplat till den syntaktiska semantiken av talet (vad personen bokstavligen säger), medan den sekundära kanalen är kopplat till den paralingvistiska delen (ton, känslotillstånd och gester). Denna studie undersöker den paralingvistiska delen av talet, mer specifikt en människas tonläge och känsla. Studien undersöker om det finns någon korrelation mellan mänskligt tal och vad personen tycker om ett parkcafé-samtal. Parkcafé samtalen i denna studie har genomförts tillsammans med den sociala roboten Furhat skapad av Furhat Robotics. Rapporten är skriven ur två perspektiv. Ett datatekniskt perspektiv där känsloyttringar i ljudfiler analyseras med hjälp av maskininlärning och matematiska modeller. Med hjälp av Vokaturi, som tillhandahåller mjukvara för känsloigenkänning av ljud, analyseras inspelade konversationer och attribut för olika känslor kvantifieras. Klassificeringsmodellen skapas sedan av dessa attribut, svar på enkätundersökningar (del ett) samt av författarna egen-annoterade ljudfiler (del två). Dessutom analyseras känsloigenkänning som metod för insamling av användaråsikter ur ett företagsekonomiskt perspektiv. Resultaten påvisar en träffsäkerhet på ca 62% och 61% för del ett respektive två och pekar på att någon form av prediktion är möjlig. Ett tydligt samband mellan deltagarens röst och dess åsikt om samtalet är dock svårt att finna med dessa resultat. I analysen och slutsatsen diskuteras svårigheterna med att ta fram en funktionell modell med tillgänglig data samt en hypotetisk diskussion kring modellen som del av en customer feedback loop. Voice analysis machine learning speech emotion recognition gathering of customer reviews customer feedback loop Röstanalys maskininlärning känsloidentifiering i tal insamling av användaråsikter system för återkoppling Computer and Information Sciences Data- och informationsvetenskap

Page generated in 0.0671 seconds