Global ETD Search

1	The impact of head and body postures on the acoustic speech signal Flory, Yvonne January 2015 (has links) This dissertation is aimed at investigating the impact of postural changes within speakers on the acoustic speech signal to complement research on articulatory changes under the same conditions. The research is therefore relevant for forensic phonetics, where quantifying within-speaker variation is vital for the accuracy of speaker comparison. To this end, two acoustic studies were carried out to quantify the influence of five head positions and three body orientations on the acoustic speech signal. Results show that there is a consistent change in the third formant, a change which was most evident in the body orientation measurements, and to a lesser extent in the head position data. Analysis of the results with respect to compensation strategies indicates that speakers employ different strategies to compensate for these perturbations to their vocal tract. Some speakers did not exhibit large differences in their speech signal, while others appeared to compensate much less. Across all speakers, the effect was much stronger in what were deemed ‘less natural’, postures. That is, speakers were apparently less able to predict and compensate for the impact of prone body orientation on their speech than for that of the more natural supine orientation. In addition to the acoustic studies, a perception experiment assessed whether listeners could make use of acoustic cues to determine the posture of the speaker. Stimuli were chosen with, by design, stronger or weaker acoustic cues to posture, in order to elicit a possible difference in identification performance. Listeners were nevertheless not able to identify above chance whether a speaker was sitting or lying in prone body orientation even when hearing the set with stronger cues. Further combined articulatory and acoustic research will have to be carried out to disentangle which articulatory behaviours correlate with the acoustic changes presented in order to draw a more comprehensive picture of the effects of postural variation on speech. Read more 414
2	Užívání glotalizace jako faktor umožňující identifikaci mluvčího / Use of glottalization as a factor enabling speaker identification Skákal, Ladislav January 2015 (has links) While handling the task of speaker identification, forensic phoneticians use a combination of various parameters contained in different levels of speech signal. The main aim of the present thesis is to explore whether glottalization in Czech may be considered as a potentially useful parameter in this sense. In our research, we focus on the rate of prevocalic glottalization at word boundaries and we distinguish between different realisations of glottalization: canonical glottal stop and its hypoarticulated form - creaky voice. The studied material consists of repeated recordings of three male and four female speakers and contains both read text and spontaneous speech. The results do not indicate that the same speaker would use glottalization differently in the first and second recording, but a difference in glottalization is found between speakers. From the forensic phonetics point of view, this finding seems to be useful. Marginally, some other factors which are not directly connected with the speaker (height of following vowel, lexical factors and speech rate) were examined, but no influence on glottalization was found. Keywords: glottal stop, glottalization, forensic phonetics, speaker identification
3	Vliv vzdělání na schopnost maskovat svůj hlas / The effect of education on the ability to disguise one's voice Vyhnálková, Lenka January 2013 (has links) (in English): Voice disguise can potentially occur in every utterance that is associated with any criminal case. In order to identify the perpetrator it is necessary to analyze the speech and understand how the different types of voice disguise can affect the speaker's voice qualities. This thesis focuses on the ability of voice disguise, portraying three groups of speakers in relation to their educational background. The aim of this work is to determine the strategies adopted by the speaker to conceal his/her identity and furthermore it poses the question whether differences among the three groups of speakers, their choice of strategy and its inherent success can be found. The basis for this research stems from 86 recordings which were undertaken in Pilsen and Prague with 43 young people aged 20 to 31. Two read utterances, one undisguised and the other freely disguised, were obtained from each of the participants and were compared with each other. The results show that the preferred forms of voice disguise appeared to involve changes in phonation - especially decrease or increase of fundamental frequency of the speaker's voice. Among the three groups of speakers, their choice and the success of the chosen strategy only minor differences could be found, yet for a final confirmation of this... Read more
4	Forensic speaker analysis and identification by computer : a Bayesian approach anchored in the cepstral domain Khodai-Joopari, Mehrdad, Information Technology & Electrical Engineering, Australian Defence Force Academy, UNSW January 2007 (has links) This thesis advances understanding of the forensic value of the automatic speech parameters by addressing the following question: what is the potentiality of the speech cepstrum as a forensic-acoustic parameter? Despite many advances in automatic speech and speaker recognition, robust and unconstrained progress in technical forensic speaker identification has been partly impeded by our incomplete understanding of the interaction and relation between forensic phonetics and the techniques employed in state-of-the-art automatic speech and speaker recognition. The posed question underlies the recurrent and longstanding issue of acoustic parameterisation in the area of forensic phonetics, where 1) speaker identification often must be carried out under less than optimal conditions, and 2) views differ on the usefulness and trustworthiness of the formant frequency measurements. To this end, a new formulation for the forensic evaluation of speech data was derived which is effectively a spectral likelihood ratio with enhanced sensitivity to the local peaks of the formant structure of the speech spectrum of vowel sounds, while retaining the characteristics of the Bayesian framework. This new hybrid formula was used together with a novel approach, which is founded on a statistically-based matched-pairs technique to account for various levels of variation inherent in speech recordings, thereby providing a spectrally meaningful measure of variations between two speech spectra and hence the true worth of speech samples as forensic evidence. The experimental results are obtained based on a forensically-realistic database of a relatively large population of 297 native speakers of Japanese. In sum, the research conducted in this thesis is a major step forward in advancing the forensic-phonetic field which broadens the objective basis of the forensic speaker identification. Beyond advancing knowledge in the field, the semi data-independent nature of the new formula ultimately has great implications in technical forensic speaker identification. It also provides us with a valuable biometric tool with both academic and commercial potential in crime investigation in a field which is already suffering from the lack of adequate data. Read more Speech speaker identification forensic phonetics automatic speech recognition cepstral parameterisation Bayesian statistical decision theory
5	Naivní a instruovaný popis hlasu / Naive and instructed description of voices Průchová, Tereza January 2018 (has links) This thesis deals with the description of the human voice from the forensic phonetic perspective. At the beginning of the theoretical part, some important cases from the history of this field are mentioned, as well as the attitude of judges to the voice as evidence.The main tasks of forensic phonetics are then briefly presented. The remainder of the theoretical part is devoted to summarizing the existing knowledge from the field of audiovisual perception and comparison of voice description and face description, including several concrete examples of systematic approaches to obtaining these descriptions, both for professional purposes and for the needs of investigators in practise. The aim of the practical part is to compare the initial uninstructed, naive description of the voice of selected speakers, obtained from the respondents in a simulated police questioning, followed by an instructed systematic description according to the interrogation protocol, which uses the layman's formulation of the individual questions instead of the original phonetic terminology, and illustrative sound samples to make it easier for the respondents to understand the protocol and to give a more detailed account of the heard voice. The first part of the results analysis is devoted primarily to the naive testimony of the... Read more
6	Sofistikované strategie maskování hlasu a jejich fonetická podoba / Sophisticated strategies of voice disguise and their phonetic character Růžičková, Alžběta January 2018 (has links) Speech contains certain attributes characteristic for a speaker, so-called idiosyncratic features. This thesis focuses on the form of these features in intentional voice disguise - whether speakers are able to change them in a substantial way, or if they tend to remain stable in spite of intentional speech modifications. It was also investigated whether any general tendencies to similar changes of such features under voice disguise exist among the speakers. The observed features were statistical f0 indicators, f0 contours, vowel formants, long- term formant distributions, spectral characteristics of sibilants, intensity, intensity contours, speech and articulation rate, %V and local articulation rate contours. In f0 median and standard deviation, vowel formants, LTFDs, intensity, articulation rate, and %V, prominent shifts under voice disguise were observed in general; in the majority of these parameters, the shifts differed among speakers. However it was found that the value of %V generally tends to rise under voice disguise. Also, intensity showed an increase in majority of cases. In f0 contours, similar patterns were observed among speakers in normal speech, however, in disguised speech, greater differences appeared among speakers; speakers tend to employ nonstandard dynamic f0 patterns more... Read more
7	Caracterização prosódica de sujeitos de diferentes variedades de fala do português brasileiro em diferentes relações sinal-ruído / Prosodic characterization of subjects from different Brazilian Portuguese varieties in different signal-to-noise ratio Constantini, Ana Carolina, 1985- 05 August 2014 (has links) Orientador: Plínio Almeida Barbosa / Tese (doutorado) - Universidade Estadual de Campinas, Instituto de Estudos da Linguagem / Made available in DSpace on 2018-08-25T03:48:52Z (GMT). No. of bitstreams: 1 Constantini_AnaCarolina_D.pdf: 2193643 bytes, checksum: c28fc92dc576ce19800b7b9ebea2f365 (MD5) Previous issue date: 2014 / Resumo: A prosódia é uma informação fônica que está além do nível do segmento, e é usualmente estudada a partir da análise de três parâmetros fonético-acústicos clássicos: frequência fundamental, intensidade e duração. Embora estudada para muitas finalidades, a prosódia geralmente não é a primeira opção de investigação quando se busca conhecer mais sobre diferenças entre variedades de uma mesma língua, por exemplo. Desta forma, o presente trabalho pretende preencher essa lacuna no que diz respeito aos estudos prosódicos para caracterizar e diferenciar variedades faladas no Brasil. O objetivo desta tese de Doutorado foi estudar parâmetros prosódicos que pudessem caracterizar e posteriormente diferenciar sujeitos de diferentes variedades faladas do português brasileiro. Em um segundo momento, ruído aditivo foi incluído nas mesmas amostras de fala utilizadas para caracterizar a prosódia de diferentes variedades do português brasileiro, com o objetivo de entender melhor como os parâmetros prosódicos se comportam quando há inclusão de ruído nas amostras de fala, situação muito comum na área da Fonética Forense. O objetivo secundário da pesquisa foi aplicar testes perceptivos a ouvintes do português brasileiro com a finalidade de saber se eles seriam capazes de reconhecer e categorizar a origem dos falantes de acordo com suas falas. Analisamos amostras de fala espontânea de 35 sujeitos, do sexo masculino, de sete regiões brasileiras: São Paulo, Minas Gerais, Rio de Janeiro, Paraná, Distrito Federal, Região Nordeste e Região Norte. Todas as amostras de fala foram segmentadas em unidades Vogal-Vogal (unidade VV), unidades do tamanho da sílaba que compreendem um segmento que vai do início de uma vogal até o início da vogal imediatamente seguinte, incluindo as consoantes entre elas. O script BeatExtractor foi utilizado para este fim. Posteriormente, outro script (ProsodicDescriptorExtractor) foi executado para extrair oito medidas prosódico-acústicas das amostras de fala: taxa de elocução (unidades VV/s), média de z-score suavizado de duração de unidade VV, desvio-padrão de z-score suavizado de duração de unidade VV, assimetria de z-score suavizado de duração de unidade VV, taxa de proeminência (picos de z-score/s), mediana de frequência fundamental, ênfase espectral e taxa de unidades VV não proeminentes por segundo. Após a análise estatística, os resultados mostraram que cinco dos oito parâmetros conseguiram identificar ao menos uma variedade estudada e assim, diferenciá-la de outras. A mediana de F0 e a ênfase espectral foram capazes de criar dois grandes grupos que separaram DF e Região Norte de todas as outras variedades (exceto pela não diferenciação de DF e Paraná), mostrando que DF e Norte possuem valores maiores de ênfase espectral, bem como têm valores de F0 maiores que os falantes de outras variedades. Assimetria de z-score suavizado e taxa de unidades VV não proeminentes/s foram os parâmetros que colocaram DF e Norte em grupos diferentes. O desvio-padrão de z-score apontou para uma diferença entre dialetos falados na região Norte do Brasil e da Região Sul: a região Norte se diferenciou de SP, DF e Nordeste e SP, que, por sua vez, se diferenciou do PR. Concluímos, portanto, que os parâmetros prosódicos podem revelar características próprias de variedades faladas no Brasil. A análise das amostras de fala em diferentes relações sinal-ruído mostrou que mediana de F0 e ênfase espectral são os parâmetros que sofrem maior perturbação quando a relação sinal-ruído é baixa, sendo que os valores de ênfase espectral chegaram a sofrer mudanças de 154% em relação a seus valores originais. O resultado mostrou que a análise da estrutura rítmica é a mais robusta quando há presença de ruído nas amostras de fala. Por fim, os testes perceptivos foram aplicados em 20 falantes do português brasileiro e a variedade mais reconhecida foi a variedade falada no Rio de Janeiro, que chegou a apresentar 90% de acerto, seguida pela variedade falada no Nordeste do Brasil. Constatamos que a proximidade da região de origem dos ouvintes com a região da variedade presente no teste facilita a identificação correta da variedade / Abstract: Prosody is usually studied by means of three classic parameters: fundamental frequency, intensity and duration. As for as dialectology is concerned, prosody has not been the main focus of the research on different dialects. Our goal is to characterize and differentiate Brazilian Portuguese varieties using prosodic parameters. In order to do that, we analyzed the recordings of spontaneous speech from 35 male subjects from seven different Brazilian regions: São Paulo (SP), Minas Gerais (MG), Rio de Janeiro (RJ), Paraná (PR), Distrito Federal (DF), Northeast (NE) and North (N). The speech samples were segmented in Vowel-to-Vowel units (VV units) using the BeatExtractor script. Later, the ProsodicDescriptorExtractor script was used to extract eight prosodic measures which are: speech rate (VV units/s), mean, standard deviation and skewness of the normalized z-score, prominence rate (peaks of z-score/s), median of fundamental frequency, spectral emphasis and rate of non prominent VV units/s. The statistical analysis revealed that five prosodic parameters were able to identify at least one variety and then differentiate it from the others. Fundamental frequency median and spectral emphasis created two groups which separated N and DF (DF is located at West-Central region, near North region) from all the other varieties, considering that N and DF were characterized by high values of these two parameters. On the other hand, skewness of z-score and rate of non prominent VV units/s set DF and N in different groups. Standard deviation of z-score pointed to differences between North varieties and South varieties. We concluded that prosodic parameters can be useful to differentiate Brazilian Portuguese varieties. Another goal of the current study was to analyze the spontaneous speech 'recordings in distinct signal-to-noise ratios. The analysis has shown that Gaussian, additive noise modifies the values for median of F0 and spectral emphasis (the least has changed 154% related to the original values). The results revealed that the rhythmic organization of the speech chain is more indicated to the analysis of acoustic parameters in the presence of noise. Finally, 20 listeners were recruited to answer a perceptual test (free classification test) about the different varieties spoken in Brazil (we used the same spontaneous speech recordings to run the perceptual test). Rio de Janeiro was the most recognized variety, which presented 90% of correct answers, followed by the NE variety. The closeness of the listeners¿ own origin to the regions of the spoken varieties contributed to correct identifications / Doutorado / Linguistica / Doutora em Linguística Read more Fonética forense Razão sinal-ruído Dialetos - Fonética Forensic phonetics Signal-to-noise ratio Dialects - Phonetics
8	Způsoby využití základní frekvence pro identifikaci mluvčích / Ways of exploiting fundamental frequency for speaker identification Hývlová, Dita January 2015 (has links) The present Master's thesis deals with the forensic use of fundamental frequency characteristics, specifically with F0 mean values and indicators of variability. Phoneticians who specialise in the forensic analysis of speech generally believe that F0 does not hold much potential as a parameter useful for speaker identification, mainly because it is easily influenced by extrinsic factors (e.g. the speaker's emotional state, interfering noise, transmission channel or even the speaker's own effort to mask his voice), which cause high intra-individual variability. Despite these facts, however, the forensic use of F0 offers a number of advantages, namely straightforward extraction from the speech signal and lower susceptibility to varying lexical content - unlike, for example, vowel formants. This thesis investigates the recordings of 8 male speakers made in two different speech styles (spontaneous and read) and compares the respective indicators of F0 stability and variability, in particular those that are robust in varying external conditions: that is, the baseline for mean values and the 10.-90. percentile range for variability indicators. Apart from that, we take into account phenomena such as the creaky voice, which are idiosyncratic and contribute to easier speaker discrimination. Key words:... Read more
9	Využití vokalických formantů pro rozpoznání mluvčího v přirozených forenzních nahrávkách / Using vowel formants for speaker identification in natural forensic recordings Nechanský, Tomáš January 2017 (has links) Voice comparison is one of the most frequently addressed terms in the context of forensic phonetics; however, so far experts have not been able to find a speech parameter which reliably discriminates between two speakers. Formant dynamics have brought promising results in this respect, therefore in our study using linear discriminant analysis (LDA) we tested the speaker-discriminatory potential of formant trajectories on real forensic recordings. The aim was firstly, to compare the results of LDA when formant frequencies or coefficients of quadratic and cubic fit are used as predictors and secondly, to compare the results when the analyzed classes are balanced or not regarding the number of objects. As for the predictors, all of the types demonstrated comparable classification rates, nevertheless, as LDA limits the number of predictors in relation to the class size, the quadratic fit appears to be the most efficient. Even though LDA was able to discriminate between different voices above chance, it cannot be recommended for forensic use. It delivered highly inconsistent results when the number of objects in the classes was changed; and more importantly, it significantly discriminates between objects of the same speaker. Key words: formant trajectories, voice comparison, LDA, Czech, forensic phonetics Read more
10	Uso de técnicas acústicas para verificação de locutor em simulação experimental / Using techniques of acoustic analysis in an experimental simulation of speaker verification Machado, Aline, 1989- 26 August 2018 (has links) Orientador: Plínio Almeida Barbosa / Dissertação (mestrado) - Universidade Estadual de Campinas, Instituto de Estudos da Linguagem / Made available in DSpace on 2018-08-26T13:16:38Z (GMT). No. of bitstreams: 1 Machado_Aline_M.pdf: 1068208 bytes, checksum: 5fde2bb97b66beab86daeec0c2e28087 (MD5) Previous issue date: 2014 / Resumo: Esta pesquisa investiga a eficácia de um conjunto de medidas acústicas para o reconhecimento da fala de um indivíduo em um grupo de dez falantes do português brasileiro. Um sujeito desse grupo foi sorteado e nomeado o "criminoso". Entre as medidas usadas na pesquisa estão, as frequências dos dois primeiros formantes das vogais, a frequência fundamental média, a duração de unidades do tamanho da sílaba e da vogal, a dinamicidade dos formantes e o desvio padrão de durações de intervalos consonânticos ('delta'C). Todos os trechos escolhidos são de entrevistados divididos em dois grupos, (i) entrevistas ao ar livre e (ii) gravações telefônicas (de celular para celular). Os indivíduos consistem em falantes do português brasileiro das regiões do estado de São Paulo, Rio Grande do Sul, Pará e Bahia. Nesta pesquisa fazemos um apanhado histórico da Fonética Forense, os métodos de análise utilizados ao longo história e também quais parâmetros acústicos mais usados para diferentes contextos de gravação, direta e por celular e quais deles foram mais significantes na pesquisa. Em nossos resultados, os parâmetros que menos sofreram variação com a mudança de canal de transmissão foram parâmetros de ritmo e tempo, como duração, taxa de elocução, 'delta'C; e um parâmetro que mede a dinamicidade do formante, que foi a taxa de movimento do segundo formante. As medidas temporais da pesquisa, por serem as mais variáveis inter-sujeito, tiveram grande poder discriminador. Os testes estatísticos apontaram que três dos indivíduos estudados, apresentavam semelhanças com o "criminoso" / Abstract: The aim of this research is to use some acoustic techniques for recognizing a subject in a group of ten speakers of the Brazilian Portuguese variety and pointing out what are the most relevant acoustic parameters for speaker recognition in that group. The analysis of the first two formants for the oral vowels, fundamental frequency, speech length, formant movement rate, syllable-sized duration, intensity and 'delta'C (standard deviation of consonantal interval durations of the collected samples) will help identifying an individual from within the group. All the samples are from interviews made in a poorly treated acoustic environment and into a mobile phone. Moreover, the samples of one the speakers (the "criminal"), which were collected in an acoustically-treated room, will simulate the questioned pattern of the forensic situation / Mestrado / Linguistica / Mestra em Linguística Read more Fonética forense Fonética acústica Percepção da fala Forensic phonetics Acoustic phonetics Speech perception

Search results