Spelling suggestions: "subject:"acoustic phonetic""
1 |
A Speech Enhancement System Based on Statistical and Acoustic-Phonetic KnowledgeSudirga, RENITA 25 August 2009 (has links)
Noise reduction aims to improve the quality of noisy speech by suppressing the background noise in the signal. However, there is always a tradeoff between noise reduction and signal distortion--more noise reduction is always accompanied by more signal distortion. An evaluation of the intelligibility of speech processed by several noise reduction algorithms in [23] showed that most noise reduction algorithms were not successful in improving the intelligibility of noisy speech.
In this thesis, we aim to utilize acoustic-phonetic knowledge to enhance the intelligibility of noise-reduced speech. Acoustic-phonetics studies the characteristics of speech and the acoustic cues that are important for speech intelligibility. We considered the following questions: what is the noise reduction algorithm that we should use, what are the acoustic cues that should be targeted, and how to incorporate this information into the design of the noise reduction system.
A Bayesian noise reduction method similar to the one proposed by Ephraim and Malah in [16] is employed. We first evaluate the goodness-of-fit of several parametric PDF models to the empirical speech data. For classified speech, we find that the Rayleigh and Gamma. with a fixed shape parameter of 5, model the speech spectral amplitude equally well. The Gamma-MAP and Gamma-MMSE estimators are derived. The subjective and objective performances of these estimators are then compared.
We also propose to apply a class-based cue-enhancement, similar to those performed in [21]. The processing directly manipulates the acoustic cues known to be important for speech intelligibility. We assume that the system has the sound class information of the input speech. The scheme aims to enhance the interclass and intraclass distinction of speech sounds. The intelligibility of speech processed by the proposed system is then compared to the intelligibility of speech processed by the Rayleigh-MMSE estimator [16]
The intelligibility evaluation shows that the proposed scheme enhances the detection of plosive and fricative sounds. However, it does not help in the intraclass discrimination of plosive sounds, and more tests need to be done to evaluate whether intraclass discrimination of fricatives is improved. The proposed scheme deteriorates the detection of nasal and affricate sounds. / Thesis (Master, Electrical & Computer Engineering) -- Queen's University, 2009-08-24 21:32:48.966
|
2 |
A Sociophonetic Ethnography of Selwyn Girls' HighDrager, Katie January 2009 (has links)
This thesis reports on findings from a year-long sociolinguistic ethnography at
an all girls’ high school in New Zealand which is referred to as Selwyn Girls’ High (SGH). The study combines the qualitative methods of ethnography
with the quantitative methods of acoustic phonetic analysis and experimental design. At the school, there were a number of different groups (e.g. The PCs, The Pasifika Group, The BBs), each forming a community of practice where the different members actively constructed their unique social personae within the context of the group. There was a dichotomy between the groups based on whether they ate lunch in the common room (CR) or not (NCR) and this division reflected the individual speakers’ stance on whether they viewed themselves as “normal” or different from other girls at the school.
In-depth acoustic analysis was conducted on tokens of the word like from the girls’ speech. This is a word with a number of different pragmatic functions, such as quotative like (I was LIKE “yeah okay”), discourse particle like (It was LIKE so boring), and lexical verb like (I LIKE your socks). The results provide evidence of acoustically gradient variation in the girls’ realisations of the word like that is both grammatically and socially conditioned. For example, quotative like was more likely to have a shorter /l/ to vowel duration ratio and be less diphthongal than either discourse particle like or grammatical like and there was a significant difference in /k/ realisation depending
on a combination of the token’s pragmatic function and whether
the speaker ate lunch in the CR or not.
Additionally, three speech perception
experiments were conducted in order to examine the girls’ sensitivity to
the relationship between phonetic variants, lemma-based information, and
social factors. The results indicate that perceivers were able to distinguish
between auditory tokens of the different functions of like in a manner that was consistent with trends observed in production. Perceivers were also able to extract social information about the speaker depending on phonetic cues in the stimuli.
Taken together, the results provide evidence that lemmas with a shared
wordform can have different phonetic realisations, that individuals can manipulate these realisations in the construction of their social personae, and that individuals can use lemma-based phonetic trends from production to
identify a word. These results have implications for how phonetic, lemma,
and social information are stored in the mind and, together, they are used
to inform a unified model of speech production, perception and identity construction.
|
3 |
The impact of head and body postures on the acoustic speech signalFlory, Yvonne January 2015 (has links)
This dissertation is aimed at investigating the impact of postural changes within speakers on the acoustic speech signal to complement research on articulatory changes under the same conditions. The research is therefore relevant for forensic phonetics, where quantifying within-speaker variation is vital for the accuracy of speaker comparison. To this end, two acoustic studies were carried out to quantify the influence of five head positions and three body orientations on the acoustic speech signal. Results show that there is a consistent change in the third formant, a change which was most evident in the body orientation measurements, and to a lesser extent in the head position data. Analysis of the results with respect to compensation strategies indicates that speakers employ different strategies to compensate for these perturbations to their vocal tract. Some speakers did not exhibit large differences in their speech signal, while others appeared to compensate much less. Across all speakers, the effect was much stronger in what were deemed ‘less natural’, postures. That is, speakers were apparently less able to predict and compensate for the impact of prone body orientation on their speech than for that of the more natural supine orientation. In addition to the acoustic studies, a perception experiment assessed whether listeners could make use of acoustic cues to determine the posture of the speaker. Stimuli were chosen with, by design, stronger or weaker acoustic cues to posture, in order to elicit a possible difference in identification performance. Listeners were nevertheless not able to identify above chance whether a speaker was sitting or lying in prone body orientation even when hearing the set with stronger cues. Further combined articulatory and acoustic research will have to be carried out to disentangle which articulatory behaviours correlate with the acoustic changes presented in order to draw a more comprehensive picture of the effects of postural variation on speech.
|
4 |
Production des consonnes plosives du français : du contrôle des bruits de plosion / French stop consonant production : burst controlCattelain, Thibault 24 June 2019 (has links)
L’étude de la production des consonnes occlusives (/p/, /b/, etc) a un intérêtparticulier pour la compréhension du contrôle moteur de la production de la parole. Eneffet, la production de ces consonnes requiert une coordination fine des 3 niveaux deproduction : respiration, vibration des cordes vocales, articulation.L’objectif de mes travaux de thèse est d’étudier la coordination des gestesrespiratoires, laryngés et articulatoires permettant de contrôler la variation de certains traits acoustiques des consonnes occlusives, plus spécifiquement les caractéristiques acoustiques de leurs bruits de plosion (intensité, durée, spectre, cruciaux pour l’intelligibilité des consonnes occlusives). Une partie importante de mes travaux de thèse s’intéresse également au contrôle musculaire de cette coordination gestuelle.Ces objectifs demandent un travail méthodologique préliminaire pour comparer,développer et implémenter différentes techniques de mesure et d’estimation des effortsarticulatoires de production de parole, à différents points de vue, physiologiques etmécaniques (cinématique du mouvement labial, capteurs de force, électromyographieorofaciale). L’exploration de ces questions a donné lieu à l’acquisition d’une large base de données (acoustiques et physiologiques) de production de consonnes occlusives du français chez une vingtaine de locuteurs adultes sains, incluant 2 modes de phonation (modal et chuchoté), 2 débits de parole (normal et rapide) et plusieurs niveaux d’effort articulatoire.L’analyse de cette base de données a permis de confirmer certaines relations déjà établies en parole conversationnelle entre l’intensité acoustique du bruit de plosion et le maximum de Pression Intra-Orale (ou la vitesse d’ouverture des lèvres pour les consonnes labiales), et entre les paramètres spectraux du bruit de plosion (skewness et kurtosis) et les paramètres articulatoires de déplacement de la langue pour les consonnes alvéolaires et vélaires.D’autres relations (non décrites dans la littérature) ont été observées en paroleconversationnelle : 1- l’intensité acoustique du bruit de plosion augmente lorsque le degré de compression labial et la vitesse de fermeture des lèvres augmentent pour les consonnes labiales ; 2- l’intensité acoustique du bruit de plosion augmente lorsque la vitesse tangentielle du mouvement d’élévation de la langue augmente pour les consonnespalatales ; 3- le degré de compression labial, les vitesses de fermeture et d’ouverture des lèvres augmentent significativement lorsque les activités des muscles Orbicularis Oris Supérieur (OOS) et Dépresseur de la Lèvre Inférieure (DLI) augmentent (dans les phases du mouvement où ils sont agonistes). Ces relations évoluent en fonction du mode de phonation (l’accent est mis en qualité chuchotée sur l’utilisation de paramètres cinématiques au détriment des paramètres aérodynamiques, articulatoires et temporels) et du débit de parole (la plupart des paramètres physiologiques et articulatoires perdent avec le débit leur efficacité de contrôle des caractéristiques acoustiques). / Stop consonants (/p/, /b/, etc) are of particular interest for the understanding of speech motor control. Indeed, the production of these stop consonant requires the coordination of the 3 production levels: breathing, vocal folds vibration and articulation.The main goal of my thesis is to study how respiratory, laryngeal and articulatory gestures coordinate to control the variation of acoustic features of stop consonants, especially of their burts (intensity, duration, spectrum), which are crucial for stop consonant intelligibility. An important part of my thesis work also focuses on the muscular control of lip gestures in the production of bilabial stops. These goals needed a preliminary methodological work to compare, develop and implement different techniques, in order to measure and estimate articulatory efforts of speech production, physiologically and mechanically (lip movement kinematics, force sensors, orofacial electromyography). This methodological exploration has given rise to theacquisition of a large database (acoustic and physiological data) of French stop consonant productions, for twenty healthy speakers, including 2 phonation modes (modal and whispered), 2 speech rates (normal and fast) and several levels of articulatory effort.The analysis of this database has confirmed relationships already established inconversational speech between burst intensity and the maximum of intra-oral pressure (or opening velocity of lips for labial stops), and between spectral features of the burst and articulatory parameters of tongue movements for alveolar and velar stops. New other relationships have been observed in conversational : 1- the burst acoustic intensity increase when the lips compression and opening velocity increase (for labial stop consonants) ; 2-the burst acoustic intensity increase when the elevation tangential velocity of the tongue increase (for palatal stop consonants) ; 3- the lips compression, lips opening and closing velocities significantly increase when the activities of the OOS (Superior Orbicularis Oris) and DLI (Depressor of the Inferior Lip) muscles increase (during the movement phasis where muscles are agonists). These relationships depend on phonation quality (in whispered speech the accent is made on using kinematic parameters at the cost of aerodynamic, articulatory and temporal ones) and speech rate (most of physiological and articulatory parameters lost efficacies for acoustic control when speech rate increase).
|
5 |
Alvos tonais: unidades fonético-fonológicas da entoação / Tonal targets: phonological-phonetic units of intonationMartins, Marcus Vinicíus Moreira 21 November 2017 (has links)
O objetivo deste trabalho é desenvolver nossa hipótese em torno dos alvos tonais, que seriam unidades fonético-fonológicas responsáveis pela mediação entre o nível representacional da entoação e o nível físico da implementação de F0. Os alvos tonais foram divididos em duas grandes categorias, topológicos e pontuais. Os primeiros ocorrem em um espaço limitado, denominado por Ferreira-Netto (2008) de tom médio. Os limites desse espaço são definidos pelos limiares de diferenciação tonal (LDT) e estão a +3 e -4 semitons do tom médio. Além e aquém destes limiares encontram-se as faixas de frequência do Foco/Ênfase. Nestas faixas os tons passam a ser eventos relevantes para os ouvintes que podem atribuir significados a eles. Aos tons que ocorrem nessa região demos o nome de pontual, uma vez que são eventos específicos. Para testar essa hipótese aplicamos o teste 1, no qual era solicitado aos participantes que repetissem uma sentença pré-gravada dotada de uma divisão entoacional marcante: na primeira parte tratava-se de uma voz masculina entoando uma frase declarativa, na segunda parte uma voz feminina entoando uma frase interrogativa. De um total de 15 participantes obtivemos 24 amostras, contando repetições intra-sujeitos. A análise foi conduzida em duas etapas, na primeira avaliamos a capacidade do falante detectar o alvo tonal topológico subjacente à primeira parte do estímulo e reproduzi-lo. Na segunda etapa, avaliamos a capacidade do falante em detectar o alvo pontual caracterizado pela interrogativa e implementá-lo em sua fala. A análise da primeira condição foi feita por meio do que denominamos índice de relação (ir), que media o grau de correlação entre o estímulo e a repetição do falante. A análise revelou que os participantes demonstraram uma grande acuidade na execução da tarefa, o que sugere que os falantes são capazes de monitorar a implementação da frequência fundamental, a partir da detecção dos alvos topológicos. Já a segunda análise demonstra que a implementação dos alvos pontuais pode ser aleatória em certa medida, uma vez que ela não precisa respeitar um limite específico, apenas um limiar. Na segunda parte do trabalho aplicamos um método semelhante, voltado à análise da fala emotiva atuada em três condições: raiva, tristeza e neutra. A frase consistia de um trecho de um livro de ciências lido nessas três emoções por atrizes profissionais, a análise por meio de testes de hipótese (n=196, p<0,005) revelou que os alvos topológicos entre as condições eram distintos, o que sugere que o espaço entoacional e a variação de frequência em seu interior podem ser uma pista significativa para a distinção da fala emotiva. / The main purpose of this work is develop our hypothesis about the tonal targets, which would be phonological-phonetic units responsible for the mediation between the representational level of the intonation and the physical level of the implementation of F0. The tonal targets were divided into two major categories, topological and punctual. The first occur in a limited space, and is called by Ferreira-Netto (2008) as midtone. The boundaries of this space are defined by the tonal differentiation thresholds (TDT) and are specified at +3 and -4 semitones from the midtone. Beyond these thresholds are the Focus/Emphasis frequency bands. In these bands the tones become relevant events to the listeners who are able to attribute meanings to them. the tones that occur in this region we gave the name of punctual, since they are specific events. To test this hypothesis, we applied test 1, in which participants were asked to repeat a pre-recorded sentence with a striking intonational division: in the first part it was a male voice spealing a declarative phrase, in the second part a female voice speaking an interrogative phrase. From 15 participants we obtained 24 samples, counting intra-subject repetitions. The analysis was conducted in two stages, in the first one we evaluated the ability of the speaker to detect the topological tonal target underlying the first part of the stimulus and to reproduce it. In the second step, we evaluated the ability of the speaker to detect the punctual target characterized by the interrogative and implement it in his speech. The analysis of the first condition was done by means of what we call the relationship index (ir), which measures the degree of correlation between the stimulus and the repetition of the speaker. The analysis revealed that the participants demonstrated a great acuity in the execution of the task, which suggests that the speakers are able to monitor the fundamental frequency implementation, from the detection of the topological targets. The second analysis, on the other hand, shows that the implementation of specific targets can be random to some extent, since it does not need to respect a specific limit, only a threshold. In the second part of the work we apply a similar method, focused on the analysis of emotional speech, in three conditions: anger, sadness and neutrality. The phrase consisted of an excerpt from a science book, read in these three emotions by professional actresses. The analysis, using hypothesis tests (n = 196, p <0.005), revealed that the topological targets between the conditions were distinct, suggesting the intonational space and the variation of frequency in its interior can be a significant clue to the distinction of the emotive speech.
|
6 |
Descrição fonético-acústica das vibrantes no português e no espanhol /Carvalho, Kelly Cristiane Henschel Pobbe de. January 2004 (has links)
Orientador: Rafael Eugenio Hoyos-Andrade / Banca: Adelaide Hercília P. Silva / Banca: Gisele Domingos do Mar / Banca: Mirian Therezinha da Matta Machado / Banca: Zilda Maria Zapparoli Castro Melo / Resumo: Neste trabalho observamos e contrastamos as realizações das consoantes chamadas vibrantes, no português e no espanhol, em diferentes contextos fônicos, do ponto de vista acústico. Para tanto, utilizamos o Multi-Speech, programa de análise de fala para Windows, produzido pela Kay Elemetrics, que possibilita o desenvolvimento das análises espectrográficas necessárias neste tipo de investigação. As gravações foram feitas em sala acusticamente isolada, com gravador profissional, no Laboratório de Línguas da Faculdade de Ciências e Letras de Assis (UNESP), por informantes da região de Assis (interior de São Paulo) e da cidade de Bogotá (Colômbia). Embora este estudo tenha um caráter primordialmente descritivo, pode, eventualmente, servir de apoio àqueles que se dedicam ao ensino/aprendizagem do português e do espanhol como línguas estrangeiras, pois atesta informações relevantes de natureza contrastiva sobre o componente fônico das duas línguas, no que se refere às consoantes vibrantes. / Abstract: This dissertation deals with the acoustic analysis of trills and taps, in Portuguese as well as in Spanish. These consonants were spectrographically studied in the different contexts in which they appear in both languages. The physical analysis was made by means of the Kay Elemetrics Multi-Speech for Windows software. With its help we obtained the sound waves and sound spectrograms, necessary to our purposes, namely the acoustical description of the selected sounds in order to elaborate a contrastive description of the "r type" consonants. Our study was limited to the Portuguese spoken in our city area (Assis SP, Brazil) and to the Spanish spoken in Bogotá (Colombia). The data to be analyzed were recorded in the Language Laboratory of our University Campus (Faculdade de Ciências e Letras de Assis - UNESP). We used a professional cassette recorder within an acoustically isolated room. Although this study has a primarily descriptive character, it may eventually help those people who are devoted to the teaching/learning process of Portuguese and Spanish as foreign languages. In fact it shows contrastive relevant information about the phonetic component properties of both languages, in the very specific area of the so called trills or vibrant consonants. / Doutor
|
7 |
A percepção da emoção na fala por nativos e não nativos / The perception of emotional speech by natives and non-nativesPeres, Daniel Oliveira 24 October 2016 (has links)
Esta pesquisa de doutorado tem como intuito investigar a percepção da emoção na fala por nativos e não nativos (falantes nativos do português brasileiro PB, e falantes de língua inglesa sem conhecimento do PB). Esta tese orienta-se pela visão evolucionista (DARWIN,1965 [1872]; PLUTCHIK, 1980, 1984; COSMIDES; TOOBY, 2000), que defende a universalidade das emoções; e pela visão social das emoções (AVERILL, 1980, 1993; HARRÉ, 1986; RUSSELL, 1991), que defende as emoções como produto das interações sociais. Para a análise, foram desenvolvidos 3 experimentos de percepção de fala envolvendo 4 emoções básicas: raiva, medo, tristeza e alegria. O primeiro experimento (piloto) foi baseado na abordagem com emoções básicas e escolha forçada; o segundo na abordagem dimensional das emoções (valência, ativação e dominância); e o terceiro na metodologia thin slicing. Ao todo, 110 informantes participaram dos experimentos, sendo 8 no experimento piloto, 36 no experimento com abordagem dimensional (julgando fala normal e fala delexicalizada) e 76 no experimento com thin slices. Nos dois primeiros experimentos, foram utilizados 32 trechos de fala emotiva espontânea do português brasileiro (PB). No experimento com thin slices, foram selecionados 48 trechos curtos de fala emotiva (PB) com duração de até 1400ms. Os julgamentos dos participantes foram cotejados com os parâmetros acústicos provenientes da análise automática (ExProsodia) e com parâmetros acústicos relacionados à qualidade de voz. Os resultados dos dois primeiros experimentos foram significativos e demonstraram que, de maneira geral, os participantes nativos e não nativos foram capazes de julgar com êxito as emoções. Entretanto, os participantes não nativos não apresentaram resultado significativo no experimento com abordagem dimensional e fala delexicalizada. De acordo com a análise dos dados do experimento com thin slices, ao contrário do que foi encontrado no experimento com abordagem dimensional (fala delexicalizada), não houve diferença significativa entre o desempenho dos nativos e dos não nativos. Embora os achados deste trabalho corroborem uma visão universalista das emoções, eles também dão pistas de que há uma vantagem intragrupo, ou seja, de que os nativos possuem uma habilidade maior em reconhecer as emoções do que os não nativos. Com base nos resultados dos experimentos com limitação de informação dos estímulos (fala delexicalizada e thin slices), a hipótese é a de que a percepção da emoção está sujeita à redundância de informação contida na fala. Dessa forma, a percepção da emoção na fala é possível mesmo com a escassez de informação do sinal acústico. / This study aims to investigate the perception of emotional speech by natives and non-natives. This study is based on the evolutionary view (DARWIN, 1965 [1872]; PLUTCHIK 1980, 1984; COSMIDES; TOOBY, 2000), which claims that the emotions are universally manifested; and the social vision of emotion (AVERILL 1980, 1993; HARRÉ, 1986; RUSSELL, 1991), which claims that emotions are a product of social interactions. In order to do so, three perception experiments were developed involving four basic emotions: anger, fear, sadness and joy. The first experiment pilot was based on the basic emotion approach (fouralternative forced choice); the second was based on the dimensional approach to emotions (valence, activation and dominance); and the third was based on the thin slicing methodology. Altogether, 110 participants performed the experiments, 8 in the pilot experiment, 36 in the experiment with dimensional approach (judging normal and delexicalized speech) and 76 in the experiment with thin slices. In the first two experiments, 32 Brazilian Portuguese (BP) excerpts of emotional speech were selected. In the experiment with thin slices 48 short excerpts of PB emotional speech were selected with duration up to 1400ms. The judgements of the participants were compared to the acoustic parameters from the automatic analysis (ExProsodia) and acoustic parameters related to voice quality. The results of the first two experiments were significant and showed that, in general, native and non-native participants were able to judge successfully emotions. However, non-native participants showed no significant result in the experiment with dimensional approach and delexicalized speech. According to the data analysis of the thin slices experiment showed that there was no significant difference between the natives and nonnatives performances, unlike what was found in the experiment with dimensional approach (delexicalized speech). Although the results of this study support a universalist perspective of emotions, they also give clues that there is an in-group advantage, namely that the natives have a greater ability to recognize emotions than non-natives. The assumption is that the perception of emotion is subject to the redundancy of information contained in the speech, based on the results of the experiments with restrict content stimuli (delexicalized speech and thin slices). Thus, the perception of emotion in speech is possible even with the lack of information of the acoustic signal.
|
8 |
Estudo fonético qualitativo da fala e do canto no teatro popular em São Paulo / Qualitative phonetic study of speech and singing in the popular theater of São PauloGisele Tomaz do Carmo 30 August 2018 (has links)
O objetivo deste trabalho é comparar o padrão formântico da fala atuada com o canto no teatro popular em São Paulo, com base nos estudos de Raposo de Medeiros (2002) e Sundberg (2015). Definiu-se a fonética acústica como área de estudo para a escolha do método, bem como para as análises dos aspectos acústicos investigados. Quanto ao método, o primeiro passo foi selecionar a canção Enchente, da peça Hospital da gente, que pertence ao repertório do Grupo Clariô de Teatro. Em seguida, realizou-se a coleta de dados da qual participou uma atriz profissional, de 33 anos. Solicitamos à atriz que cantasse e falasse o texto da canção como se estivesse no palco. A fala produzida pela atriz apresentou duas características distintas: em alguns momentos foi executada de forma gritada e, em outros momentos, de forma não-gritada, que nomeamos de fala normal. A fala gritada nos chamou à atenção, e consequentemente, despertou-nos o interesse em observar esse aspecto de qualidade de voz, em nossos dados. Após a gravação, tratamento e segmentação dos dados, medimos e comparamos qualitativamente os três primeiros formantes, F1, F2 e F3, bem como a Frequência Fundamental, F0, das vogais do PB na posição tônica, em sua porção mais estável. Com a emergência dos dados de fala gritada, vimos, então, a necessidade de o corpus ser aumentado com uma terceira condição de gravação, a condição de fala neutra, que não pode ser coletada pela atriz por motivos de falta de agenda. Assim, coletamos os dados da autora desta pesquisa para ser utilizada como parâmetro nas análises de qualidade de voz. Em um análise qualitativa foi possível dizer que as vogais da fala apresentam variação em sua produção, o que resulta em valores muitos diferentes intra vogais, por exemplo entre as vogais [e]. Já no canto foi possível perceber que as vogais [a], [] e [] apresentam seus valores mais concentrados, enquanto as demais vogais cantadas, as vogais altas, tendem a mostrar valores mais dispersos mesmo quando cantadas. A qualidade de voz da atriz varia ao longo do texto, mas as ocorrências de fala gritada possuem o F1 elevado; uma das característica descritas na literatura para descrever esse tipo de fala. Essa dissertação tenta aproximar estudos acadêmicos da área linguística, com o movimento artístico da periferia de São Paulo, com o intuito de apresentar aos artistas como a fonética acústica pode auxiliá-los em suas composições, no sentido de dar um pouco de clareza de como funciona o processo de produção de fala. / The objective of this work is to compare the formant pattern of speech with singing in the popular theater in São Paulo, based on studies by Raposo de Medeiros (2002) and Sundberg (2015). Acoustic Phonetics was defined as a study area both for the methodology and for the analysis of acoustic aspects investigated. As for the method, the first step was to select the song \"Enchente\" from the play \"Hospital da Gente\" which belongs to the repertoire of the Clariô Theater Group. Then, a data collection was performed, with the participation of a 33 years old professional actress. We asked the actress to sing and speak the text of the song as if she were on the stage. The speech produced by the actress presented two distinct characteristics: in a few moments she performed it in a shouted way and, at other times, in a non-shouted way, that we call normal speech. The shouted speech caught our attention, and consequently aroused our interest in observing this aspect of voice quality, in our data. After recording, treatment and segmentation of the data, we measured and compared the first three formants, F1, F2 and F3, as well as the Fundamental Frequency, F0, of the PB vowels in the tonic position, in their most stable portion. With the emergence of shouted speech data we then saw the need for the corpus to be increased with a third recording condition, the neutral speech condition, which could not be collected with the actress due to her full agenda. Thus, we collected data from the author of this research to be used as a parameter in the analysis of voice quality. In a qualitative analysis it was possible to say that speech vowels present variation in their production, which results in many different intra vowel values, for example between vowels [e]. In the song it was possible to perceive that the vowels [a], [] and [] present their most concentrated values, while the other vowels sung, the high vowels, tend to disperse even when sung. The voice quality of the actress varies throughout the text, but the shouted speech occurrences have high F1; one of the characteristics described in the literature to describe this type of speech. This dissertation tries to approximate academic studies of the linguistic area, with the artistic movement of the periphery of São Paulo, in order to present to the artists how acoustic phonetics can help them in their compositions, in the sense of giving a little clarity of how it works the process of speech production.
|
9 |
Alvos tonais: unidades fonético-fonológicas da entoação / Tonal targets: phonological-phonetic units of intonationMarcus Vinicíus Moreira Martins 21 November 2017 (has links)
O objetivo deste trabalho é desenvolver nossa hipótese em torno dos alvos tonais, que seriam unidades fonético-fonológicas responsáveis pela mediação entre o nível representacional da entoação e o nível físico da implementação de F0. Os alvos tonais foram divididos em duas grandes categorias, topológicos e pontuais. Os primeiros ocorrem em um espaço limitado, denominado por Ferreira-Netto (2008) de tom médio. Os limites desse espaço são definidos pelos limiares de diferenciação tonal (LDT) e estão a +3 e -4 semitons do tom médio. Além e aquém destes limiares encontram-se as faixas de frequência do Foco/Ênfase. Nestas faixas os tons passam a ser eventos relevantes para os ouvintes que podem atribuir significados a eles. Aos tons que ocorrem nessa região demos o nome de pontual, uma vez que são eventos específicos. Para testar essa hipótese aplicamos o teste 1, no qual era solicitado aos participantes que repetissem uma sentença pré-gravada dotada de uma divisão entoacional marcante: na primeira parte tratava-se de uma voz masculina entoando uma frase declarativa, na segunda parte uma voz feminina entoando uma frase interrogativa. De um total de 15 participantes obtivemos 24 amostras, contando repetições intra-sujeitos. A análise foi conduzida em duas etapas, na primeira avaliamos a capacidade do falante detectar o alvo tonal topológico subjacente à primeira parte do estímulo e reproduzi-lo. Na segunda etapa, avaliamos a capacidade do falante em detectar o alvo pontual caracterizado pela interrogativa e implementá-lo em sua fala. A análise da primeira condição foi feita por meio do que denominamos índice de relação (ir), que media o grau de correlação entre o estímulo e a repetição do falante. A análise revelou que os participantes demonstraram uma grande acuidade na execução da tarefa, o que sugere que os falantes são capazes de monitorar a implementação da frequência fundamental, a partir da detecção dos alvos topológicos. Já a segunda análise demonstra que a implementação dos alvos pontuais pode ser aleatória em certa medida, uma vez que ela não precisa respeitar um limite específico, apenas um limiar. Na segunda parte do trabalho aplicamos um método semelhante, voltado à análise da fala emotiva atuada em três condições: raiva, tristeza e neutra. A frase consistia de um trecho de um livro de ciências lido nessas três emoções por atrizes profissionais, a análise por meio de testes de hipótese (n=196, p<0,005) revelou que os alvos topológicos entre as condições eram distintos, o que sugere que o espaço entoacional e a variação de frequência em seu interior podem ser uma pista significativa para a distinção da fala emotiva. / The main purpose of this work is develop our hypothesis about the tonal targets, which would be phonological-phonetic units responsible for the mediation between the representational level of the intonation and the physical level of the implementation of F0. The tonal targets were divided into two major categories, topological and punctual. The first occur in a limited space, and is called by Ferreira-Netto (2008) as midtone. The boundaries of this space are defined by the tonal differentiation thresholds (TDT) and are specified at +3 and -4 semitones from the midtone. Beyond these thresholds are the Focus/Emphasis frequency bands. In these bands the tones become relevant events to the listeners who are able to attribute meanings to them. the tones that occur in this region we gave the name of punctual, since they are specific events. To test this hypothesis, we applied test 1, in which participants were asked to repeat a pre-recorded sentence with a striking intonational division: in the first part it was a male voice spealing a declarative phrase, in the second part a female voice speaking an interrogative phrase. From 15 participants we obtained 24 samples, counting intra-subject repetitions. The analysis was conducted in two stages, in the first one we evaluated the ability of the speaker to detect the topological tonal target underlying the first part of the stimulus and to reproduce it. In the second step, we evaluated the ability of the speaker to detect the punctual target characterized by the interrogative and implement it in his speech. The analysis of the first condition was done by means of what we call the relationship index (ir), which measures the degree of correlation between the stimulus and the repetition of the speaker. The analysis revealed that the participants demonstrated a great acuity in the execution of the task, which suggests that the speakers are able to monitor the fundamental frequency implementation, from the detection of the topological targets. The second analysis, on the other hand, shows that the implementation of specific targets can be random to some extent, since it does not need to respect a specific limit, only a threshold. In the second part of the work we apply a similar method, focused on the analysis of emotional speech, in three conditions: anger, sadness and neutrality. The phrase consisted of an excerpt from a science book, read in these three emotions by professional actresses. The analysis, using hypothesis tests (n = 196, p <0.005), revealed that the topological targets between the conditions were distinct, suggesting the intonational space and the variation of frequency in its interior can be a significant clue to the distinction of the emotive speech.
|
10 |
Expressividade da fala: o desvelar da locução de um poema a partir da análise acústica e da filosofia de SpinozaSantos, Isaías 29 October 2010 (has links)
Made available in DSpace on 2016-04-28T18:22:13Z (GMT). No. of bitstreams: 1
Isaias Santos.pdf: 2870235 bytes, checksum: 967d62fb4678a813f7c66159db6f279f (MD5)
Previous issue date: 2010-10-29 / Coordenação de Aperfeiçoamento de Pessoal de Nível Superior / This work is situated in Acoustic Phonetics, Experimental, and a multidisciplinary work
addresses issues related to the expressivity of speech through the speech of a poem and
the emotions expressed by this phrase. For such uses the Acoustic Phonetics and the
philosophy of Spinoza. We seek, therefore, quantitatively describe the lengths of units
V-Vs, the minimum of f0, the maximum of f0, f0 and the extent of the breaks to
investigate whether the expression of a poem by a professional actor would show gains
in meaning to the poem. And qualitatively, the interaction between prosody and
affections (emotions) described by Spinoza. The aim of this study is to verify through
the expression of affections (emotions) acoustic parameters such as duration and f0 and
pauses, which interact in the pursuit of linguistic and paralinguistic meanings. By
hypothesis we have that the phrase of the poem gives it gains meaning. Thus, it is
necessary to a wider range of analysis and prosodic patterns for analysis of the emotions
expressed in speech. The parameters studied here, the expression of certain emotions,
since diseases such as sadness are evidenced by the acoustic analysis and statistics. The
dynamics of f0 suggests drama and meaning to the phrase gains of the poem.Thus,
further studies are needed to confirm the evidence found in this work / Esta dissertação de mestrado se situa na Fonética Acústica Experimental, sendo um
trabalho multidisciplinar aborda questões referentes à Expressividade da fala por meio
da locução de um poema e das emoções expressas por meio desta locução. Para tanto
recorre ao instrumental de laboratório da Fonética Acústica e da filosofia de Spinoza.
Buscamos, portanto, descrever quantitativamente as durações de unidades V-Vs, o
minimum de f0, o maximum de f0, a extensão de f0 e as pausas para investigar se a
locução de um poema por um ator profissional mostraria ganhos de sentido ao poema. E
qualitativamente, a interação prosódia e afecções (emoções) descritas por Spinoza. O
objetivo deste trabalho é verificar por meio da expressão de afecções (emoções) os
parâmetros acústicos, como duração e f0 e as pausas, que interagiriam na busca dos
sentidos linguísticos e paralinguísticos. Por hipótese temos que a locução do poema lhe
confere ganhos de sentido. Desse modo, faz-se necessário uma gama maior de análises e
de padrões prosódicos para análise das emoções expressas na fala. Os parâmetros aqui
estudados privilegiam a expressão de certas emoções, uma vez que afecções como
tristeza são evidenciadas pela análise acústica e estatística. A dinâmica do f0 indicia
dramaticidade e ganhos de sentido à locução do poema. Desse modo, novos estudos são
necessários para confirmar os indícios encontrados neste trabalho
|
Page generated in 0.091 seconds