Spelling suggestions: "subject:"acoustic 2analysis"" "subject:"acoustic 3analysis""
21 |
A computational model of the relationship between speech intelligibility and speech acousticsJanuary 2019 (has links)
abstract: Speech intelligibility measures how much a speaker can be understood by a listener. Traditional measures of intelligibility, such as word accuracy, are not sufficient to reveal the reasons of intelligibility degradation. This dissertation investigates the underlying sources of intelligibility degradations from both perspectives of the speaker and the listener. Segmental phoneme errors and suprasegmental lexical boundary errors are developed to reveal the perceptual strategies of the listener. A comprehensive set of automated acoustic measures are developed to quantify variations in the acoustic signal from three perceptual aspects, including articulation, prosody, and vocal quality. The developed measures have been validated on a dysarthric speech dataset with various severity degrees. Multiple regression analysis is employed to show the developed measures could predict perceptual ratings reliably. The relationship between the acoustic measures and the listening errors is investigated to show the interaction between speech production and perception. The hypothesize is that the segmental phoneme errors are mainly caused by the imprecise articulation, while the sprasegmental lexical boundary errors are due to the unreliable phonemic information as well as the abnormal rhythm and prosody patterns. To test the hypothesis, within-speaker variations are simulated in different speaking modes. Significant changes have been detected in both the acoustic signals and the listening errors. Results of the regression analysis support the hypothesis by showing that changes in the articulation-related acoustic features are important in predicting changes in listening phoneme errors, while changes in both of the articulation- and prosody-related features are important in predicting changes in lexical boundary errors. Moreover, significant correlation has been achieved in the cross-validation experiment, which indicates that it is possible to predict intelligibility variations from acoustic signal. / Dissertation/Thesis / Doctoral Dissertation Speech and Hearing Science 2019
|
22 |
An Acoustical Analysis of the American English /l, r/ Contrast as Produced by Adult Japanese Learners of English Incorporating Word Position and Task TypeChase, Braden Paul 01 June 2017 (has links)
Adult Japanese learners of English (JLEs) are often stereotyped as being unable to produce or perceive the English phonemes /l/ and /r/. This study analyzed acoustic samples of /l/ and /r/ obtained from intermediate-level Japanese speakers in two variable contexts: word positions (initial/final) and task type (controlled/free). These tokens were subjected to acoustic analysis which is one way of comparing oral productions of native and non-native English speakers. Previous research has identified a lowered third formant (F3) as the hallmark of an American English /r/ as produced by a native speaker, independent of word position or task type. The results indicate that participants can produce appropriate and statistically significant differences (p<.001) between these two phonemes across word position and task type. Other findings indicate that neither task type nor word position had a significant effect on F3 values. These results indicate that Japanese speakers of English may have the ability to distinguish /l/ from /r/ without specialized pronunciation training, but these differences are less dramatic as identified by F3 frequency values that those produced by native English speakers when producing these contrasting phonemes. In most tokens, however, large effect sizes remained between JLE productions and NES standards.
|
23 |
Effects of Clear Speech and Linguistic Experience on Acoustic Characteristics of Vowel ProductionBianchi, Michelle 17 July 2007 (has links)
The present study investigated the hypothesis that later and/or early learners of English as a second language may exhibit an exaggerated or restricted degree of change in their production performance between clear and conversational speech styles for certain acoustic cues. Monolingual English talkers (MO), early Spanish-English bilinguals (EB) and late Spanish-English bilinguals (LB) were recorded using both clear and conversational speaking styles. The stimuli consisted of six target vowels /i, I, e, E, ae/ and /a/, embedded in /bVd/ context. All recorded target-word stimuli were isolated into words. Vowel duration was computed, and fundamental frequency (F0), and formant frequency values (F1-F4) were measured at 20%, 50%, and 80% of the vowel duration.
Data from the MO and EB talkers indicates that these two groups are very similar in that they emphasize duration differences in clear speech, have similar spacing of vowels (static & dynamic properties), and have similar frequency changes in clear speech. Data from the LB talkers indicates that this group failed to emphasize differences in clear speech, particularly duration differences. In addition, the high-mid front vowels (/i, I, e/ and /E/) were found to be very poorly separated in the F1-F2 space for the LB talkers. In support of the hypothesis, the data showed that LB talkers exhibited a restricted degree of change in their production performance between clear and conversational speech styles for duration, as compared to monolingual talkers. Data analyzed for the EB talkers do not reveal systematic reductions in the degree of change in their production performance between clear and conversational speech styles, as compared to monolingual talkers.
|
24 |
Does real-time visual feedback improve pitch accuracy in singing?Wilson, Pat H January 2007 (has links)
Master of Applied Science / The aim of this investigation was to investigate the effects of computer-based visual feedback in the teaching of singing. Pitch accuracy, a readily-measured parameter of the singing voice, was used in this study to gauge changes in singing for groups with and without visual feedback. The study investigated whether the style of feedback affects the amount of learning achieved, and whether the provision of concurrent visual feedback hampers the simultaneous performance of the singing task. The investigation used a baseline–intervention–post-test between-groups design. Participants of all skill levels were randomly assigned to a control group or one of two experimental groups – with all participants given one hour of singing training. At intervention, the two experimental groups were offered one of two different displays of real-time visual feedback on their vocal pitch accuracy, while control participants had a non-interactive display. All sessions were recorded, and the vocal exercise patterns performed at baseline, intervention and post-test phases were acoustically analysed for pitch accuracy. Questionnaires assessed both general health and the amount of singing and music training of all participants; people in the two experimental groups were also given a further questionnaire about the visual feedback. The results indicate that visual feedback improves pitch accuracy in singing. Cognitive load related to the decoding of visual information was a factor at intervention. At post-test, the two groups who had used real-time visual feedback demonstrated marked improvement on their initial pitch accuracy. There was no significant difference between the results of participants from the two experimental groups, although the participants with some background in singing training showed greater improvement using a simpler visual feedback design. The findings suggest that a hybrid approach integrating standard singing teaching practices with real-time visual feedback of aspects of the singing voice may improve learning.
|
25 |
Combining acoustic analysis and phonotactic analysis to improve automatic speech recognitionNulsen, Susan, n/a January 1998 (has links)
This thesis addresses the problem of automatic speech recognition, specifically, how
to transform an acoustic waveform into a string of words or phonemes. A preliminary
chapter gives linguistic information potentially useful in automatic speech
recognition. This is followed by a description of the Wave Analysis Laboratory
(WAL), a rule-based system which detects features in speech and was designed as
the acoustic front end of a speech recognition system. Temporal reasoning as used
in WAL rules is examined. The use of WAL in recognizing one particular class of
speech sounds, the nasal consonants, is described in detail.
The remainder of the thesis looks at the statistical analysis of samples of spontaneous
speech. An orthographic transcription of a large sample of spontaneous
speech is automatically translated into phonemes. Tables of the frequencies of
word initial and word final phoneme clusters are constructed to illustrate some
of the phonotactic constraints of the language. Statistical data is used to assign
phonemes to phonotactic classes. These classes are unlike the acoustic classes,
although there is a general distinction between the vowels, the consonants and the
word boundary.
A way of measuring the phonetic balance of a sample of speech is described. This
can be used as a means of ranking potential test samples in terms of how well they
represent the language.
A phoneme n-gram model is used to measure the entropy of the language. The
broad acoustic encoding output from WAL is used with this language model to
reconstruct a small test sample.
"Branching" a simpler alternative to perplexity is introduced and found to give
similar results to perplexity. Finally, the drop in branching is calculated as knowledge
of various sets of acoustic classes is considered.
In the work described in this thesis the main contributions made to automatic
speech recognition and the study of speech are in the development of the Wave
Analysis Laboratory and in the analysis of speech from a phonotactic point of view.
The phoneme cluster frequencies provide new information on spoken language,
as do the phonotactic classes. The measures of phonetic balance and branching
provide additional tools for use in the development of speech recognition systems.
|
26 |
Vad kännetecknar en bra röst? : En studie om röster i kommersiella sammanhangFranzén, Jerker, Wijkmark, Hannes January 2012 (has links)
Vad kännetecknar en bra röst? Denna studie syftar till att utröna huruvida det finns något i den akustiska talsignalen som kan säga något om vilken uppfattning lyssnaren bildar sig om talaren. I kommersiella sammanhang är sättet att förmedla viktigt för att ge lyssnaren rätt budskap och i detta är rösten ett viktigt redskap. Grunden till föreliggande studie var att låta en grupp lyssnare bedöma några olika röster samt att analysera rösterna akustiskt, varefter en jämförelse mellan resultat från lyssnarbedömning och akustisk analys gjordes. En kvinna och en man i grupperna professionella samt icke-professionella röstanvändare läste in en standardtext. Inspelningarna analyserades akustiskt och bedömdes perceptuellt av en lyssnargrupp, omfattande 10 individer utan röstprofessionell kompetens. En intervju genomfördes även med de två professionella röstanvändarna. Resultaten visar att olika akustiska parametrar hos de studerade kvinnornas respektive männens röster hade samband med lyssnarnas uppfattning om rösterna. Männens röster med ett lågt F0 skattades som intresseväckande och förtroendeingivande. Samma egenskaper hos kvinnorna samförekom med en stor variabilitet i F0. På grund av det begränsade urvalet kunde inga generella slutsatser om vad som kännetecknar en bra röst dras. Intervjun gav att det i radioreklamsammanhang inte finns en ofelbar röst som kan anlitas till alla uppdrag, utan att olika rösttyper passar till olika sammanhang. / What makes a good voice? The present study seeks to determine whether there is something in the acoustic speech signal that may tell us something about how the listener perceives the speaker. In a commercial context, the way in which the message is constructed is important so the listener receives the message in the way the producer of the commercial intended. In this process the voice is an important tool. In the present study audio recordings were made of two professional and two non-professional voice users, one man and one woman in each group, reading a standard text. The recordings were analyzed acoustically and evaluated by a group of listeners. The results were summarized and compared. In addition, an interview was conducted with two professional speakers. Results of the analysis show that women's and men's voices differ in which acoustic parameters affect the listener's perception of the voice. A low mean F0 among men co-occured with the estimation of an interesting and trustful voice by the group of listeners. Among women, a high standard deviation of F0 was the parameter that co-occured with an estimate of an interesting and trustful voice. Due to the limited sample size, no general conclusions could be made. The answers from the interview indicated that there is no such thing as an infallible voice that can be hired for any assignment, but that different types of voices are suitable for different contexts.
|
27 |
An acoustic analysis of Burmese toneKelly, Niamh Eileen 16 April 2013 (has links)
This paper examines the acoustic characteristics that differentiate the four tones of Burmese: high, low, creaky and stopped.
The majority of previous work on Burmese tone is impressionistic but recently has become experimental. There are conflicting analyses of how the tones are distinguished. In particular, there is disagreement about the f0 contour of the high and low tones, the consistency of creakiness in the creaky and stopped tones, the role of f0 in distinguishing the creaky and stopped tones, and the vowel quality of the stopped tone.
Recordings were made of four native speakers of Burmese, aged 24-30, who read sentences containing a carrier word with one of the four tones and one of two vowels, /a/ and /i/. Seven variables were measured: f0 contour (onset, offset, peak f0, peak delay), duration, voice quality, and vowel quality.
It was found that the high and low tones are differentiated from the creaky and stopped tones by onset f0, peak f0, relative peak delay, duration, and voice quality. The high and low tones are distinguished from one another by offset f0, peak f0, relative peak delay, and voice quality. The creaky and stopped tones appear to be differentiated from one another mainly by vowel quality.
This paper adds necessary acoustic analysis to the literature on Burmese tone, with the finding that a variety of characteristics is used to distinguish each tone. The findings of this experiment also add to the current understanding of the interactions between tone and phonation, as well as phonation and vowel quality. / text
|
28 |
Κατασκευή συστήματος αναγνώρισης ηχητικών σημάτων ροχαλητού με συστοιχία πιεζοηλεκτρικών αισθητήρωνΛιβάνιος, Απόστολος 07 May 2015 (has links)
Το ροχαλητό είναι ένα φαινόμενο που σπάνια προκαλεί ανησυχία στους ασθενείς του. Επιπλέον, η διάγνωση του ροχαλητού απαιτεί ακόμα και σήμερα εξέταση σε εργαστήριο πολυυπνογραφίας, μια διαδικασία που είναι ακριβή και επίπονη για τον ασθενή. Συνδυασμένα αυτά τα γεγονότα οδηγούν σε ένα τεράστιο ποσοστό μη διαγνωσμένων ασθενών που κινδυνεύουν να παρουσιάσουν ή παρουσιάζουν ήδη ψυχολογική επιβάρυνση, κάποια μορφή καρδιακής νόσου, μειωμένες επιδόσεις στις καθημερινές τους ασχολίες και άλλα παρεπόμενα του ροχαλητού.
Πολλές μέθοδοι έχουν αναπτυχθεί για διάγνωση ροχαλητού με ακουστική ανάλυση των ήχων κατά τη διάρκεια του ύπνου με σκοπό να μετατρέψουν τη διάγνωση του ροχαλητού σε εύκολη και βολική για τον ασθενή διαδικασία. Αν και αυτές οι μέθοδοι φαίνεται να έχουν καλά αποτελέσματα σε πειράματα εργαστηρίου, πολλές φορές δεν είναι αρκετά ανθεκτικές και σταθερές και απαιτούν χρήση συγκεκριμένου εξοπλισμού για να λειτουργήσουν σε βέλτιστο επίπεδο. Αυτό έχει σαν αποτέλεσμα να μη λειτουργούν ικανοποιητικά αν υλοποιηθούν στα πλαίσια μιας καθημερινής συσκευής που ήδη έχει ο ασθενής και δε χρειάζεται να αγοράσει, όπως το κινητό του τηλέφωνο.
Με αυτή τη διπλωματική γίνεται προσπάθεια ανάπτυξης και υλοποίησης μεθόδου αναγνώρισης ροχαλητού αρκετά ανθεκτικής ως προς το θόρυβο αλλά και την απόσταση της συσκευής ηχογράφησης από τον ασθενή, ώστε να μπορεί να χρησιμοποιηθεί ακόμα και με συσκευές κινητού τηλεφώνου. Επίσης, γίνονται πειραματισμοί για δημιουργία δύο μεθόδων για αναγνώριση ροχαλητού με αυτόματη εξαγωγή χαρακτηριστικών με χρήση Sparse Coding και Convolutional Predictive Sparse Decomposition Auto-encoders. / Snoring rarely is cause of alarms for its patients. In addition, in order to diagnose snoring, a patient has to pass the night at a polysomnography lab, a process that is both expensive and inconvenient. These two facts lead to a massive percentage of undiagnosed patients that might run the risk of being affected by mood swings, some kind of heart disease and other side effects of snoring.
Many methods of acoustical analysis of sleep sounds have been developed in order to make snore diagnosis an easy and inexpensive process. Even though these methods seem to be good at diagnosing snore sounds in a lab environment, they sometimes fail when put in a home environment since they are not robust against noise and they are highly dependent on the equipment used for the recording of the sounds. Thus, they are not effective in most scenarios so that they can be implemented in devices that a patient might already own and replace polysomnography.
In this thesis project it is attempted to develop and implement a snoring detection method that is robust enough to be used in practice. Moreover, methods of automatic feature extraction are experimented with using Sparse Coding and Convolutional Predictive Sparse Decomposition Auto-encoders.
|
29 |
Does real-time visual feedback improve pitch accuracy in singing?Wilson, Pat H January 2007 (has links)
Master of Applied Science / The aim of this investigation was to investigate the effects of computer-based visual feedback in the teaching of singing. Pitch accuracy, a readily-measured parameter of the singing voice, was used in this study to gauge changes in singing for groups with and without visual feedback. The study investigated whether the style of feedback affects the amount of learning achieved, and whether the provision of concurrent visual feedback hampers the simultaneous performance of the singing task. The investigation used a baseline–intervention–post-test between-groups design. Participants of all skill levels were randomly assigned to a control group or one of two experimental groups – with all participants given one hour of singing training. At intervention, the two experimental groups were offered one of two different displays of real-time visual feedback on their vocal pitch accuracy, while control participants had a non-interactive display. All sessions were recorded, and the vocal exercise patterns performed at baseline, intervention and post-test phases were acoustically analysed for pitch accuracy. Questionnaires assessed both general health and the amount of singing and music training of all participants; people in the two experimental groups were also given a further questionnaire about the visual feedback. The results indicate that visual feedback improves pitch accuracy in singing. Cognitive load related to the decoding of visual information was a factor at intervention. At post-test, the two groups who had used real-time visual feedback demonstrated marked improvement on their initial pitch accuracy. There was no significant difference between the results of participants from the two experimental groups, although the participants with some background in singing training showed greater improvement using a simpler visual feedback design. The findings suggest that a hybrid approach integrating standard singing teaching practices with real-time visual feedback of aspects of the singing voice may improve learning.
|
30 |
Μέτρηση και ανάλυση της ακουστικής και ηλεκτροακουστικής εγκατάστασης του Συνεδριακού Κέντρου του Πανεπιστημίου Πατρών / Measurement and analysis of the acoustic and the electroacoustic installation of the Conference Center of the University of PatrasΣιάτρα, Μυρτώ 07 June 2010 (has links)
Η διπλωματική αυτή εργασία ασχολείται με την μέτρηση και την ανάλυση της ακουστικής, καθώς και της ηλεκτροακουστικής εγκατάστασης του Συνεδριακού Κέντρου του Πανεπιστημίου Πατρών. Συγκεκριμένα μελετήθηκαν οι τιμές των κυριότερων ακουστικών παραμέτρων, όπως προέκυψαν από ακουστικές και ηλεκτροακουστικές μετρήσεις, που διεξήχθησαν στην κύρια αίθουσα Ι1. Ωστε να εξαχθούν χρήσιμα συμπεράσματα για την καταλληλότητα της αίθουσας για μουσική και ομιλία. Στη συνέχεια έγινε προσομοίωση της αίθουσας και ανάλυση της ακουστικής της, σύμφωνα με την απορροφητικότητα των επιφανειών και σύμφωνα με την θέση της πηγής στη σκηνή, μέσω του λογισμικού Catt Acoustic. Τέλος στα πλαίσια της εργασίας δημιουργήθηκε και ένας ηλεκτρονικός οδηγός, του ηλεκτροακουστικού συστήματος του αμφιθεάτρου Ι1 και των πολλαπλών δυνατοτήτων που αυτό παρέχει. / The subject of this diploma thesis is the study of the Conference Center’s main hall (I1).The main purpose was to draw useful conclusions of the main hall’s suitability for concerts and speeches and to accurately predict its acoustic behaviour.
In practice, acoustic and electroacoustics measurements took place, in order to specify the unique acoustic parameters. With the use of the simulation software program Catt -Acoustic a detailed model of the hall was produced and was later differentiated, as far the absorption of the surfaces and the effect of the source’s place on the stage.
Furthermore an application was created, in order to show the main hall’s multiple capabilities and also to behave as a digital guide for its electroacoustic system.
|
Page generated in 0.0424 seconds