Spelling suggestions: "subject:"cisual prosody"" "subject:"cisual nosody""
1 |
Visual prosody in speech-driven facial animation: elicitation, prediction, and perceptual evaluationZavala Chmelicka, Marco Enrique 29 August 2005 (has links)
Facial animations capable of articulating accurate movements in synchrony with a
speech track have become a subject of much research during the past decade. Most of
these efforts have focused on articulation of lip and tongue movements, since these are
the primary sources of information in speech reading. However, a wealth of
paralinguistic information is implicitly conveyed through visual prosody (e.g., head and
eyebrow movements). In contrast with lip/tongue movements, however, for which the
articulation rules are fairly well known (i.e., viseme-phoneme mappings, coarticulation),
little is known about the generation of visual prosody.
The objective of this thesis is to explore the perceptual contributions of visual prosody in
speech-driven facial avatars. Our main hypothesis is that visual prosody driven by
acoustics of the speech signal, as opposed to random or no visual prosody, results in
more realistic, coherent and convincing facial animations. To test this hypothesis, we
have developed an audio-visual system capable of capturing synchronized speech and
facial motion from a speaker using infrared illumination and retro-reflective markers. In
order to elicit natural visual prosody, a story-telling experiment was designed in which
the actors were shown a short cartoon video, and subsequently asked to narrate the
episode. From this audio-visual data, four different facial animations were generated,
articulating no visual prosody, Perlin-noise, speech-driven movements, and ground truth
movements. Speech-driven movements were driven by acoustic features of the speech
signal (e.g., fundamental frequency and energy) using rule-based heuristics and
autoregressive models. A pair-wise perceptual evaluation shows that subjects can clearly
discriminate among the four visual prosody animations. It also shows that speech-driven
movements and Perlin-noise, in that order, approach the performance of veridical
motion. The results are quite promising and suggest that speech-driven motion could
outperform Perlin-noise if more powerful motion prediction models are used. In
addition, our results also show that exaggeration can bias the viewer to perceive a
computer generated character to be more realistic motion-wise.
|
2 |
Génération de la prosodie audio-visuelle pour les acteurs virtuels expressifs / Generation of audio-visual prosody for expressive virtual actorsBarbulescu, Adela 23 November 2015 (has links)
Le travail presenté dans cette thèse adresse le problème de génération des performances expressives audio-visuelles pour les acteurs virtuels. Un acteur virtuel est répresenté par une tête parlante en 3D et une performance audio-visuelle contient les expressions faciales, les mouvements de la tête, la direction du regard et le signal de parole.Si une importante partie de la littérature a été dediée aux émotions, nous explorons ici les comportements expressifs verbaux qui signalent les états mentaux, i.e. "ce que le locuteur sent par rapport à ce qu'il dit". Nous explorons les caractéristiques de ces attitudes dites dramatiques et la manière dont elles sont encodées par des signatures prosodiques spécifiques pour une personne i.e. des motifs spécifiques à l'état mental de trajectoires de paramètres prosodiques audio-visuels. / The work presented in this thesis addresses the problem of generating audio-visual expressive performances for virtual actors. A virtual actor is represented by a 3D talking head and an audio-visual performance refers to facial expressions, head movements, gaze direction and the speech signal.While an important amount of work has been dedicated to emotions, we explore here expressive verbal behaviors that signal mental states, i.e "how speakers feel about what they say". We explore the characteristics of these so-called dramatic attitudes and the way they are encoded with speaker-specific prosodic signatures i.e. mental state-specific patterns of trajectories of audio-visual prosodic parameters.
|
3 |
Gestualidade vocal e visual, expressão de emoções e comunicação faladaFontes, Mario Augusto de Souza 29 April 2014 (has links)
Made available in DSpace on 2016-04-28T18:22:51Z (GMT). No. of bitstreams: 1
Mario Augusto de Souza Fontes.pdf: 4762004 bytes, checksum: c7ff5cc05505a43e5e72d43d7e407eb1 (MD5)
Previous issue date: 2014-04-29 / This thesis examines the vocal and visual links involved in emotion expression by developing a perceptual and acoustic experiment whose main objective is investigating the functions of the gestural prosody (vocal and visual) in the appraisal of seven basic emotions (Anger, Distaste, Fear, Happiness, Joy, Sadness and Shame) and valence (positive, neutral and negative). Its specific objectives are twofold: investigating the role of the vocal and visual gestures in identifying the basic emotions and discussing the interaction between the visual, vocal and semantic dimensions in the evaluation of the utterances constituting the research corpus so as to explore the links between gestural prosody and emotive expression. The research hypothesis is that there will be differences in the interpretation if visual or vocal cues are considered in isolation or together and that depending on the nature of the emotion or the semantic load of the utterance the visual or the vocal aspects will have a stronger influence on perceptual judgments. The relations between gestural prosody and emotional expression are investigated with the following analytical procedures and methods: acoustic analysis by means of measures extracted by the ExpressionEvaluator Script developed by Barbosa (2009); perceptual evaluation tests applied to a group of 34 judges using the Gtrace developed by MCKeownetal (2011) to identify valence (positiveness or negativeness) and a set of 7 basic emotions (joy, happiness, anger, shame, fear, sadness and distress); visual gestural description based on a visual profile which takes into account facial movements and their directionality; description of the voice quality settings by means of the application of the VPAS developed by Laver and Mackenzie (2007) as adapted to Portuguese by Camargo and Madureira (2008). The correlation among the variables was made by non-parametric tests applying the FAMD, MCA, PCA,HCPC and MFA methods. The results indicate that the VPAS variables Raised Larynx and High Pitch and the acoustic measures mednf0 and quan995f0 had a stronger influence in the correlations between the vocal gestures and the emotional expression. As evidences of the variables which emerged with wider significance and showed their strength in the formation of utterance clusters, vocal gestuality was interpreted in terms of metaphors of the emotions expressed in speech. The analysis of the role of the vocal and visual gestures in the identification of valence and emotions showed that VPAS variables and the Expression Evaluator measures were quite influential. VPAS had a weight to represent the vector space of the variables under study. Our hypothesis related to the integration between the semantic and vocal and visual dimensions were confirmed. Both visual and vocal prosody interact with the semantic domain, emphasizing or changing the semantic load of the utterances. The differences concerning the interpretation of visual or vocal cues taken in isolation or together were pointed out. It was also found that the identification of emotions and valence
varied depending on the emotion category or on the specificity attributed to valence. As a conclusion the contributions of this work to evaluate the communicative relevance of the voice qualities in the expression of emotions in speech and the utterances better representing the analyzed emotions were discussed and the path taken in building up the work was traced / Esta tese aborda relações entre os gestos vocal e visual e a expressão emotiva por meio do desenvolvimento de uma pesquisa de natureza experimental que tem como objetivo geral investigar, com base na interpretação de resultados de testes de avaliação perceptiva e caracterização de propriedades fonético-acústicas e visuais, as funções da prosódia gestual (vocal e visual) na avaliação de sete emoções básicas (Alegria, Desgosto, Felicidade, Medo, Raiva, Tristeza, Vergonha) e da valência (positiva, negativa e neutra). Como objetivos específicos temos: investigar o papel do gestos vocal e visual na identificação das emoções básicas, refletir sobre a interação dos planos visual, vocal e semântico na avaliação das produções dos enunciados que constituem o corpus de pesquisa e as correlações entre a prosódia gestual e a expressão emotiva. Como hipóteses de pesquisa, postulamos que serão encontradas diferenças de interpretação se consideradas pistas visuais ou vocais isoladamente ou em conjunto e que, dependendo da natureza da emoção ou da carga semântica do enunciado, os aspectos visuais ou vocais poderão adquirir maior peso nos julgamentos. As correlações entre prosódia gestual e expressão emotiva são investigadas com base nos seguintes procedimentos e métodos analíticos: análise fonético-acústica por meio da aplicação do ExpressionEvaluator, script desenvolvido por Barbosa (2009); testes de avaliação perceptiva da valência e das sete emoções básicas realizados com o auxílio do ambiente GTrace, desenvolvido por McKeown et al (2012), e realizados por 34 juízes; descrição dos gestos visuais por meio de consideração das variáveis concernidas em termos de movimentação e direcionalidade em roteiro para avaliação de gestos faciais; descrição dos ajustes de qualidades de voz por meio do VPAS, desenvolvido por Laver et al (2007), com adaptação para o português por Camargo e Madureira (2008), o VPAS-PB. A correlação entre as variáveis é realizada por meio de testes não paramétricos com a utilização dos métodos de FAMD, MCA, PCA, HCPC e MFA. Os resultados revelam que as variáveis laringe elevada e pitch elevado avaliadas pelo VPAS e as medidas acústicas, mednf0 e quan995f0 apresentaram maior peso no estabelecimento das correlações entre a gestualidade vocal e a expressão de emoções na fala. Como evidências das variáveis que emergiram com maior significância e exerceram sua força na formação de clusters de enunciados, interpretamos a gestualidade vocal em termos de metáforas das emoções expressas na fala. A análise do papel dos gestos vocal e visual na identificação da valência e das emoções mostrou que as variáveis do VPAS e do ExpressionEvaluator foram muito influentes. O VPAS revelou-se com a maior força de representação do espaço vetorial das variáveis estudadas. Confirmamos nossas hipóteses de pesquisa em relação à interação do plano semântico e da prosódia visual e vocal. Tanto a prosódia visual quanto a vocal interagiram com o plano semântico, intensificando ou alterando a carga semântica dos enunciados. Foram encontradas diferenças de interpretação se consideradas pistas visuais ou vocais isoladamente ou em conjunto. Verificou-se que, para a identificação de emoções e para a aferição da positividade ou negatividade, a maior ou menor influência dos aspectos visuais ou vocais variou, dependendo da natureza da emoção e da especificidade da valência. Como conclusão, consideramos as contribuições deste trabalho para avaliar a relevância comunicativa da qualidade de voz na expressão de
emoções na fala, apresentamos os enunciados que melhor representaram as emoções analisadas e avaliamos o percurso trilhado na construção da tese
|
4 |
Etude contrastive de la prosodie audio-visuelle des affects sociaux en chinois mandarin vs.français : vers une application pour l'apprentissage de la langue étrangère ou seconde / Constrastive study of audio-visual prosody of social affects in Mandarin Chinese vs.French : an application for foreign or second language learningLu, Yan 22 January 2015 (has links)
Se distinguant des expressions émotionnelles qui sont innées et déclenchées par un contrôle involontaire du locuteur au sein d'une communication face-à-face, les affects sociaux émergent plutôt de manière volontaire et intentionnelle, et sont largement véhiculés par la prosodie audio-visuelle. Ils mettent en circulation, entre les interactants, des informations sur la dynamique du dialogue, la situation d'énonciation et leur relation sociale. Ces spécificités culturelles et linguistiques de la prosodie socio-affective dans la communication orale constituent une difficulté, même un risque de malentendu, pour les apprenants en langue étrangère (LE) et en langue seconde (L2). La présente thèse se consacre à des études intra- et interculturelles sur la perception de la prosodie de 19 affects sociaux en chinois mandarin et en français, ainsi que sur leurs représentations cognitives. Son but applicatif vise l'apprentissage de la prosodie des affects sociaux en chinois mandarin et en français LE ou L2. Le premier travail de la thèse consiste en la construction d'un large corpus audio-visuel des affects sociaux chinois. 152 énoncés variés dans leur longueur, leur morpho-syntaxe et leur représentation tonale sont respectivement produits dans les 19 affects sociaux. Sur la base de ce corpus, sont examinées l'identification et les confusions perceptives de ces affects sociaux chinois par des natifs, des français et des vietnamiens (comme groupe de référence), ainsi que l'effet du ton lexical sur l'identification auditive des sujets non natifs. Les résultats montrent que la majorité des affects sociaux chinois est perçue de manière similaire par les sujets natifs et les sujets non natifs, cependant certains décalages perceptifs sont également observés. Les tons chinois engendrent des problèmes perceptifs des affects sociaux autant pour les vietnamiens (d'une langue tonale) que pour les français (d'une langue non tonale). En parallèle, une analyse acoustique permet de mettre en évidence les caractéristiques principales de la prosodie socio-affective en chinois et d'étayer les résultats perceptifs. Ensuite, une étude sur les distances conceptuelles d'une part, et psycho-acoustiques d'autre part, entre les affects sociaux est menée auprès de sujets chinois et de sujets français. Les résultats montrent que la plupart des connaissances sur les affects sociaux sont partagées par les sujets, quels que soient leur langue maternelle, leur genre ou la manière de présenter les affects sociaux (concepts ou entrées acoustiques). Enfin, le dernier chapitre de la thèse est consacré à une étude contrastive sur la perception multimodale des affects sociaux en chinois et en français LE ou L2. Il est constaté que la reconnaissance des affects sociaux est étroitement liée aux expressions elles-mêmes et à la modalité de présentation de ces expressions. Le degré d'acquisition de la langue cible du sujet (débutant ou intermédiaire) n'a pas d'impact significatif à la reconnaissance, dans le cadre restreint des niveaux étudiés. / In human face-to-face interaction, social affects should be distinguished from emotional expressions, triggered by innate and involuntary controls of the speaker, by their nature of voluntary controls expressed within the audiovisual prosody and by their important role in the realization of speech acts. They also put into circulation between the interlocutors the social context and social relationship information. The prosody is a main vector of social affects and its cross-language variability is a challenge for language description as well as for foreign language teaching. Thus, cultural and linguistic specificities of the socio-affective prosody in oral communication could be a difficulty, even a risk of misunderstanding, for foreign language and second language learners. This thesis is dedicated to intra- and intercultural studies on perception of the prosody of 19 social affects in Mandarin Chinese and in French, on their cognitive representations, as well as on Chinese and French socio-affective prosody learning for foreign and second language learners. The first task of this thesis concerns the construction of a large audio-visual corpus of Chinese social affects. 152 sentences with the variation of length, tone location and syntactic structures of utterances, have been incorporated with 19 social affects. This corpus is served to examine the identification and perceptual confusion of these Chinese social affects by native and non-native listeners, as well as the tonal effect on non-native subjects' identification. Experimental results reveal that the majority of social affects are similarly perceived by native and non-native subjects, otherwise, some differences are also observed. Lexical tones lead to certain perceptual problems also for Vietnamese listeners (of a tonal language) and for French listeners (of a non-tonal language). In parallel, an acoustic analysis investigates the production side of prosodic socio-affects in Mandarin Chinese, and allows highlighting the more prominent patterns of acoustical variations as well as supporting the perceptual resultants obtained on the same expressions. Then, a study on conceptual and psycho-acoustic distances between social affects is carried out with Chinese and French subjects. The main results indicate that all subjects share to a very large extent the knowledge about these 19 social affects, regardless of their mother language, gender or how to present social affects (concept or acoustic realization). Finally, the last chapter of thesis is dedicated to the differences in the perception of 11 Chinese social affects expressed in different modalities (audio only, video only and audio-visual) for French learners and native subjects, as well as in the perception of the same French socio-affects for Chinese learners and native subjects. According to the results, the identification of affective expressions depends more on their affective values and on their presentation modality. Subject's learning level (beginner or intermediate) does not have a significant effect on their identification.
|
Page generated in 0.055 seconds