• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 28
  • 10
  • 7
  • 5
  • 2
  • 2
  • 2
  • 2
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 75
  • 24
  • 20
  • 17
  • 15
  • 14
  • 13
  • 12
  • 12
  • 10
  • 10
  • 10
  • 10
  • 8
  • 8
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
21

That voice sounds familiar : factors in speaker recognition

Eriksson, Erik J. January 2007 (has links)
<p>Humans have the ability to recognize other humans by voice alone. This is important both socially and for the robustness of speech perception. This Thesis contains a set of eight studies that investigates how different factors impact on speaker recognition and how these factors can help explain how listeners perceive and evaluate speaker identity. The first study is a review paper overviewing emotion decoding and encoding research. The second study compares the relative importance of the emotional tone in the voice and the emotional content of the message. A mismatch between these was shown to impact upon decoding speed. The third study investigates the factor dialect in speaker recognition and shows, using a bidialectal speaker as the target voice to control all other variables, that the dominance of dialect cannot be overcome. The fourth paper investigates if imitated stage dialects are as perceptually dominant as natural dialects. It was found that a professional actor could disguise his voice successfully by imitating a dialect, yet that a listener's proficiency in a language or accent can reduce susceptibility to a dialect imitation. Papers five to seven focus on automatic techniques for speaker separation. Paper five shows that a method developed for Australian English diphthongs produced comparable results with a Swedish glide + vowel transition. The sixth and seventh papers investigate a speaker separation technique developed for American English. It was found that the technique could be used to separate Swedish speakers and that it is robust against professional imitations. Paper eight investigates how age and hearing impact upon earwitness reliability. This study shows that a senior citizen with corrected hearing can be as reliable an earwitness as a younger adult with no hearing problem, but suggests that a witness' general cognitive skill deterioration needs to be considered when assessing a senior citizen's earwitness evidence. On the basis of the studies a model of speaker recognition is presented, based on the face recognition model by V. Bruce and Young (1986; British Journal of Psychology, 77, pp. 305 - 327) and the voice recognition model by Belin, Fecteau and Bédard (2004; TRENDS in Cognitive Science, 8, pp. 129 - 134). The merged and modified model handles both familiar and unfamiliar voices. The findings presented in this Thesis, in particular the findings of the individual papers in Part II, have implications for criminal cases in which speaker recognition forms a part. The findings feed directly into the growing body of forensic phonetic and forensic linguistic research.</p>
22

That voice sounds familiar : factors in speaker recognition

Eriksson, Erik J. January 2007 (has links)
Humans have the ability to recognize other humans by voice alone. This is important both socially and for the robustness of speech perception. This Thesis contains a set of eight studies that investigates how different factors impact on speaker recognition and how these factors can help explain how listeners perceive and evaluate speaker identity. The first study is a review paper overviewing emotion decoding and encoding research. The second study compares the relative importance of the emotional tone in the voice and the emotional content of the message. A mismatch between these was shown to impact upon decoding speed. The third study investigates the factor dialect in speaker recognition and shows, using a bidialectal speaker as the target voice to control all other variables, that the dominance of dialect cannot be overcome. The fourth paper investigates if imitated stage dialects are as perceptually dominant as natural dialects. It was found that a professional actor could disguise his voice successfully by imitating a dialect, yet that a listener's proficiency in a language or accent can reduce susceptibility to a dialect imitation. Papers five to seven focus on automatic techniques for speaker separation. Paper five shows that a method developed for Australian English diphthongs produced comparable results with a Swedish glide + vowel transition. The sixth and seventh papers investigate a speaker separation technique developed for American English. It was found that the technique could be used to separate Swedish speakers and that it is robust against professional imitations. Paper eight investigates how age and hearing impact upon earwitness reliability. This study shows that a senior citizen with corrected hearing can be as reliable an earwitness as a younger adult with no hearing problem, but suggests that a witness' general cognitive skill deterioration needs to be considered when assessing a senior citizen's earwitness evidence. On the basis of the studies a model of speaker recognition is presented, based on the face recognition model by V. Bruce and Young (1986; British Journal of Psychology, 77, pp. 305 - 327) and the voice recognition model by Belin, Fecteau and Bédard (2004; TRENDS in Cognitive Science, 8, pp. 129 - 134). The merged and modified model handles both familiar and unfamiliar voices. The findings presented in this Thesis, in particular the findings of the individual papers in Part II, have implications for criminal cases in which speaker recognition forms a part. The findings feed directly into the growing body of forensic phonetic and forensic linguistic research.
23

Vowel identification by monolingual and bilingual listeners: Use of spectral change and duration cues

Glasbrenner, Merete Møller 01 June 2005 (has links)
Recent studies have shown that even highly-proficient Spanish-English bilinguals, who acquired their second language (L2) in childhood and have little or no foreign accent in English, may require more acoustic information than monolinguals in order to identify English vowels and may have more difficulty than monolinguals in understanding speech in noise or reverberation (Mayo, Florentine, and Buus, 1997; Febo, 2003). One explanation that may account for this difference is that bilingual listeners use acoustic cues for vowel identification differently from monolinguals (Flege, 1995).In this study, we investigated this hypothesis by comparing bilingual listeners use of acoustic cues to vowel identification to that of monolinguals for six American English vowels presented under listening conditions created to manipulate the acoustic cues of vowel formant dynamics and duration. Three listener groups were tested: monolinguals, highly proficient bilinguals, and less proficient bilinguals. Stimulus creation included recording of six target vowels (/i, I, eI, E, ae, A/) in /bVd/ context, spoken in a carrier phrase by four American monolinguals (two females, two males). Six listening conditions were created: 1) whole word, 2) isolated vowel, 3) resynthesized with no change, 4) resynthesized with neutralized duration, 5) resynthesized with flattened formants, and 6) resynthesized with flattened formats and neutralized duration. The resynthesized stimuli were created using high-fidelity synthesis procedures (Straight; Kawahara, Masuda-Katsuse, and Cheveigne 1998) and digital manipulation. A six-alternative forced choice listening task was used. The main experiment was composed of 240 isolated vowel trials and 48 whole word trials.
24

The perceptibility of duration in the phonetics and phonology of contrastive consonant length

Hansen, Benjamin Bozzell 12 July 2012 (has links)
This dissertation investigates the hypothesis that the more vowel-like a consonant is, the more difficult it is for listeners to classify it as geminate or singleton. A perceptual account of this observation holds that more vowel-like consonants lack clear markers to signal the beginning and ending of the consonant, so listeners don’t perceive the precise duration and consequently the phonological contrast may be neutralized in some languages. Three experiments were performed to address these questions using data from Persian speakers. In Experiment I, four speakers produced singleton and geminate tokens of the voiced oral consonants [d,z,n,l,j] and the glottals [h] and glottal stop at three speaking rates. It was found that Persian speakers do distinguish geminate durations from singleton durations for all manners even at very fast speaking rates, and vowels preceding geminates are slightly longer than those preceding singletons. Speaking rate had more of an effect on geminates than on singletons for all segments studied: the durations of the geminates decreased more in fast speech than the durations of the singletons did. In Experiment II, listeners heard manipulated continua of consonants ranging from singletons to geminates. Subjects’ identification curves were modeled using the cumulative Gaussian model. The modeled standard deviation was interpreted as the breadth of the perceptual threshold, and a broader threshold understood to indicate a less distinct perceptual boundary between the two categories. Obstruents [d,z] had smaller breadth values than the sonorants [n,l,j], and the glottals had the largest breadth values of all. This indicates that while sonorants were more difficult for listeners to categorize than obstruents, the glottals were the most difficult to categorize of the segments tested. Experiment III tested whether the modification of a specific parameter, the formant transition duration, would affect the perceptibility of the geminate/singleton contrast. A single token containing the glide [j] was manipulated to produce three different continua, each having a distinctly different manipulated transition: short, normal or long. It was found that the longer the transition was, the broader the perceptual threshold, thus making the consonant harder to categorize. / text
25

The Impact of Breathiness on Speech Intelligibility in Pathological Voice

Thompson, Louise Shirley January 2011 (has links)
Aim The aim of this study was to determine how deterioration of voice quality, such as breathiness, may impact on the intelligibility of speech. Method Acoustic analysis was conducted on sustained vowel phonation (/i/ and /a/) and sentences produced by voice disordered speakers. Measures included: frequency and amplitude of the first two formants (F1, F2), singing power ratio (SPR), the amplitude difference between the first two harmonics (H1-H2), voice onset time (VOT), and energy ratio between consonant and vowel (CV energy ratio). A series of two-way (glottal closure by vowel) mixed design between and within-subjects Analysis of Variances conducted on these acoustic measures showed a significant glottal closure (complete and incomplete) or glottal closure by vowel interaction effect on the F2 frequency, H1-H2 amplitude difference, and singing power ratio. Based on findings in literature that reported a dominant first harmonic as a useful predictor of breathiness, the measure of H1-H2 amplitude difference was selected as a factor for investigation of the impact of voice quality on the perception of vowel intelligibility and clarity. Fixed-length vowel segments at five levels of H1-H2 amplitude difference were presented to 10 male and 10 female inexperienced listeners between the ages of 19 and 34 years. Results It was expected that the tokens with a dominant first harmonic, indicative of a more breathy voice, would be associated with a lower rate of correct vowel identification and of being perceived as “clearer”. Although no linear relationship between breathiness and intelligibility was revealed, results indicated the presence of thresholds of intelligibility for particular vowels whereby once a level of breathiness was reached intelligibility would decline. Conclusion The finding of a change of the perceptual ratings as a function of the H1-H2 amplitude difference, identified in previous studies as a measure of breathiness, revealed thresholds of intelligibility for particular vowels below which breathiness would be tolerated with little impact on intelligibility but beyond which intelligibility ratings suffered markedly.
26

Effects of Masking, and Sex on Lombard Vowel Production

Askin, Victoria January 2014 (has links)
The change a speaker makes in response to background noise is known as the Lombard Effect (LE). This study investigated the acoustic changes that are undergone in the presence of broadband noise and two-talker babble. Of particular interest were vocal fundamental frequency (F0) and formant frequency vowel space measures across sex. Forty participants (20 male, 20 female) were recruited and asked to read phrases in quiet and in the presence of two-talker babble and broadband noise. These masker conditions were presented at 50 and 70 dB HL. The phrases were recorded and acoustically analysed. The results showed a significant sex difference for both F0 and vowel space. A masking condition effect was not displayed for either F0 or vowel space. A significant effect was however shown for F0 according to intensity level, suggesting a LE. While the sex difference in F0 values can be explained on the basis of differences in vocal anatomy, the sex difference in vowel space was indicative of a sociophonetic influence on speech production.
27

Kalbos trakto perdavimo funkcijos modeliavimas / Modeling transfer function of vocal track

Bauža, Donatas 17 July 2014 (has links)
Darbą sudaro 57 puslapiai. Darbe yra 8 paveiksliukai, 8 grafikai ir 16 lentelių. Darbo tikslas buvo išsiaiškinti kaip keičiasi sintezuotų balsių formantų energija bei daţnis, keičiant oro tankio ir oro trinties į balso trakto sieneles koeficiento dydţius. Darbe buvo MatLab ir Praat programų pagalbą ištirti: a trumpas, a ilgas, e trumpas ir e ilgas balsiai. Oro tankio dydţio keitimas buvo pasirinktas remiantis oro tankio kitimu natūraliomis sąlygomis esant -20 0C (šaltą ţiemą) ir 300C (karšta vasara). O trinties koeficiento pokytis – 7%. Tyrimo metu gauti duomenys rodo nedidelį balsių formantų energijų ir daţnių pokytį ( iki 1%). Kai kuriais atvejais išryškėja vieno formanto parametrų pokytis, kuris gali siekti iki 4%. Taigi galima sutikti, kad keičiant oro tankio dydį ir trinties koeficientą, galima neţymiai pakoreguoti sintezuojamų balsiu formantų parametrus. / Work contains 57 pages. There is 8 pictures, 8 diagrams and 16 tables. The purpose of this work was to analyze how air density and friction value influences artificial vowel‟s formant energy and frequency. MatLab was used to generate artificial vowel and Praat was used to get and analyze formants. Air density change conditions was chosen similar to natural conditions in cold winter and host summer. Friction changes because of vocal tract humidity. 7% change was taken. Test results showed 1% change in vowel formant energy and frequency. In some cases there was 4% change in vowel formant energy and frequency. We can use vocal tract parameters like air density and friction to slightly change artificial vowel‟s parameters.
28

Defining Britain's Most Appealing Voice : An Accent Profile of Sir Sean Connery

Hill, Christopher January 2007 (has links)
The aim of this paper is to explore the features that combine to make up the distinctive accent of the actor Sir Sean Connery. This study outlines the subject’s basic vowel system and compares it to data collected on the vowel systems of Received Pronunciation (RP) and Scottish Standard English (SSE) from previous research (Stuart-Smith 1999, Hawkins &amp; Midgely 2005, Fisk 2006). Furthermore, this essay examines the degree to which other elements associated with SSE are present in the subject’s accent. These features include the Scottish Vowel Lengthening Rule (SVLR), the presence of dark /l/, rhoticity and T-glottalling. It is hypothesised that the subject speaks a modified variety of SSE yet retains the aforementioned qualities typically associated with SSE. The speech analysis software programs Wavesurfer (version 1.4.7.) and Praat (version 4.4.33.) were used to analyse sections of sound taken from a speech given by the subject at an awards ceremony. Instrumental analysis of this nature was deemed appropriate in order to establish a high degree of objectivity in this study. Of the wide range of recorded material available the subject’s acceptance speech was judged most suitable for analysis. This is a passage of spontaneous speech as opposed to a movie script, where the subject talks of his background and career. Having analysed the subject’s accent in this way, certain sociolinguistic implications can be drawn. The results suggest that Sir Sean Connery does indeed speak a variety of SSE however rather surprisingly the subject’s accent appears quite typical of his Edinburgh origins. The vowel system not only identifies the subject as an SSE speaker but also indicates traces of his working-class background e.g., the frontal quality to Connery’s realisation of /u/ and his low /I/ are typical of a working-class SSE speaker. Moreover, the general low quality found in Connery’s basic vowel system can be interpreted as revealing a little of his working-class origins. Evidence of the other features associated with SSE was also found in the subject’s accent. Durational evidence indicates (albeit tentatively at this stage) that the SVLR operates within his accent while dark /l/ and t-glottalling were also observed. While it is also apparent that Connery speaks a rhotic variety of English it is the nature and variety of his /r/ production that is most interesting. The subject appears to produce a retroflex realisation of /r/ which affects other consonants in its environment. This /r/ may be indicative of an earlier Irish influence over Connery’s accent. It should be stated that due to the nature and the limited size of this study, all findings are preliminary and more research is needed into this area before any firm conclusions can be drawn.
29

Perfil espectrográfico da hipernasalidade de fala de mulheres portadoras de fissura palatina / Spectrographic profile of the speech hipernasality cleft palate women

Jussara Melo Vieira 29 January 2004 (has links)
A hipernasalidade de fala é um distúrbio da ressonância nasal, freqüentemente encontrada em portadores de fissura palatina com disfunção velofaríngea. Constitui-se de uma nasalidade imprópria e excessiva dos sons exclusivamente orais, que pode ser analisada pela espectrografia, que decompõe o sinal de fala em três dimensões de análise: freqüência, tempo e intensidade, gerando um gráfico, o espectrograma. Assim, o objetivo do presente estudo é investigar as características espectrográficas da hipernasalidade de fala de mulheres portadoras de fissura palatina e comparar os achados com os resultados da nasometria e da avaliação perceptivo -auditiva de suas emissões de fala. Contou-se com a colaboração de 30 mulheres sem comprometimentos de fala e do trato vocal, 5 portadoras de fissura palatina não operada e 21 portadoras de fissura palatina operada, na faixa etária de 18 a 40 anos de idade. Emitiram as vogais [a] e nasal sustentadas, separadamente e dentro de uma frase-veículo diante do nasômetro e de um gravador digital. Estas amostras de fala foram avaliadas perceptivo-auditivamente, determinadas suas nasalâncias e características formânticas. Foram encontradas as seguintes características espectrográficas: inserção de formantes nasais e antiformantes dentre os formantes orais nas emissões nasalizadas/hipernasalizadas. Não houve correspondência direta desses achados com a nasometria nem com a avaliação perceptivo- auditiva e nem destas entre si / Speech hypernasality is a nasal resonance’s disorder, come across often in cleft palate persons with velopharingeal disfuction. It is an inappropriate and excessive nasality just oral sounds, that can be analyzed for spectrograph (decompose speech signal in three dimensions: frequency, time and intensity, generating the spectrogram). Hence, this dissertation has the objective of investigating speech hipernasality spectrographic cues of the cleft palate women and to compare the findings with the nasometry results and auditory perceptual evaluation of the speech issues. Thirty women without speech and of the vocal tract problems, 5 cleft palate women no operated and 21 operated (18 to 40 years old) issues [a] and nasal, isolated and within of the carrier phrase in front of the nasometer and digital record. After auditory perceptual evaluation, nasalances and formants cues it was possible to verify nasal formants and antiformants among oral formants in the nasalized/hipernasalyzed issues and don´t have correspondency this findings with the nasometry neither with the auditory perceptual evaluation neither themselves
30

Využití vokalických formantů pro rozpoznání mluvčího v přirozených forenzních nahrávkách / Using vowel formants for speaker identification in natural forensic recordings

Nechanský, Tomáš January 2017 (has links)
Voice comparison is one of the most frequently addressed terms in the context of forensic phonetics; however, so far experts have not been able to find a speech parameter which reliably discriminates between two speakers. Formant dynamics have brought promising results in this respect, therefore in our study using linear discriminant analysis (LDA) we tested the speaker-discriminatory potential of formant trajectories on real forensic recordings. The aim was firstly, to compare the results of LDA when formant frequencies or coefficients of quadratic and cubic fit are used as predictors and secondly, to compare the results when the analyzed classes are balanced or not regarding the number of objects. As for the predictors, all of the types demonstrated comparable classification rates, nevertheless, as LDA limits the number of predictors in relation to the class size, the quadratic fit appears to be the most efficient. Even though LDA was able to discriminate between different voices above chance, it cannot be recommended for forensic use. It delivered highly inconsistent results when the number of objects in the classes was changed; and more importantly, it significantly discriminates between objects of the same speaker. Key words: formant trajectories, voice comparison, LDA, Czech, forensic phonetics

Page generated in 0.0473 seconds