Global ETD Search

1	A facial animation model for expressive audio-visual speech Somasundaram, Arunachalam 21 September 2006 (has links) No description available. Computer Science expressive facial speech animation expressive audio-visual speech facial animation speech animation facial expressions face speech emotions
2	Perceptual Evaluation of Video-Realistic Speech Geiger, Gadi, Ezzat, Tony, Poggio, Tomaso 28 February 2003 (has links) abstract With many visual speech animation techniques now available, there is a clear need for systematic perceptual evaluation schemes. We describe here our scheme and its application to a new video-realistic (potentially indistinguishable from real recorded video) visual-speech animation system, called Mary 101. Two types of experiments were performed: a) distinguishing visually between real and synthetic image- sequences of the same utterances, ("Turing tests") and b) gauging visual speech recognition by comparing lip-reading performance of the real and synthetic image-sequences of the same utterances ("Intelligibility tests"). Subjects that were presented randomly with either real or synthetic image-sequences could not tell the synthetic from the real sequences above chance level. The same subjects when asked to lip-read the utterances from the same image-sequences recognized speech from real image-sequences significantly better than from synthetic ones. However, performance for both, real and synthetic, were at levels suggested in the literature on lip-reading. We conclude from the two experiments that the animation of Mary 101 is adequate for providing a percept of a talking head. However, additional effort is required to improve the animation for lip-reading purposes like rehabilitation and language learning. In addition, these two tasks could be considered as explicit and implicit perceptual discrimination tasks. In the explicit task (a), each stimulus is classified directly as a synthetic or real image-sequence by detecting a possible difference between the synthetic and the real image-sequences. The implicit perceptual discrimination task (b) consists of a comparison between visual recognition of speech of real and synthetic image-sequences. Our results suggest that implicit perceptual discrimination is a more sensitive method for discrimination between synthetic and real image-sequences than explicit perceptual discrimination. AI visual speech speech animation face animation image morphing lip reading
3	Lietuvių kalbos animavimo technologija taikant trimatį veido modelį / Lithuanian speech animation technology for 3D facial model Mažonavičiūtė, Ingrida 18 February 2013 (has links) Kalbos animacija plačiai naudojama technikos įrenginiuose siekiant kurtiesiems, vaikams, vidutinio ir vyresnio amžiaus žmonėms sudaryti vienodas bendravimo galimybes. Žmonės yra labai jautrūs veido išvaizdos pokyčiams, todėl kalbos animavimas yra sudėtingas procesas, kurio metu žmogaus kalboje atpažinta akustinė informacija (fonemos) yra vizualizuojama naudojant specialiai sumodeliuotas veido išraiškas vadinamas vizemomis. Didžiausią įtaką kalbos animacijos tikroviškumui turi teisingas fonemas atitinkančių vizemų identifikavimas, modeliavimas ir jų išrikiavimas laiko juostoje. Tačiau, norint užtikrinti kalbos animacijos natūralumą, būtina papildomai išnalizuoti vizemų įtaką kaimyninėms fonemoms ir atsižvelgiant į animuojamos kalbos fonetines savybes sukurti koartikuliacijos valdymo modelį. Kiekvienos kalbos fonetika skiriasi, todėl kitai vienai kalbai sukurta animavimo sistema nėra tiesiogiai tinkama kitai kalbai animuoti. Kalbos animavimo karkasas, kuriame realizuojama Lietuvių kalbai skirta animavimo technologija, turi būti sukurta lietuvių kalbai vizualizuoti. Darbą sudaro įvadas, trys pagrindiniai skyriai, bendrosios išvados, literatūros sąrašas, publikacijų sąrašas. Pirmame skyriuje Skyriuje analizuojamos pasaulyje naudojamos kalbos animavimo technologijos. Kalbos signalas yra ir girdimas, ir matomas, todėl jos animacija yra sudėtinis procesas priklausantis nuo pasirinktos veido modeliavimo metodikos, kalbos signalo tipo, ir koartikuliacijos valdymo modelio. Antrajame... [toliau žr. visą tekstą] / Speech animation is widely used in technical devices to allow the growing number of hearing impaired persons, children, middle-aged and elderly equal participation in communication. Speech animation systems (“Talking heads”) are basically driven by speech phonetics and their visual representation – visemes. Acuraccy of the chosen speech recognition engine, naturally looking visemes, phoneme to viseme mapping and coarticulation control model considerably influence the quality of animated speech. Speech animation is strongly related with language phonetics, so new“Talking heads” should be created to animate different languages. Framework suitable to animate Lithuanian speech, which includes two new models that help to improve intelligibility of animated Lithuanian speech is used to create Lithuanian „Talking head” „LIT”. The dissertation consists of Introduction, three main chapters and general conclusions. Chapter 1 provides the analysis of the existing speech animation technologies. Different facial modelling techniques are analysed to define the most suitable 3D „Talking head” modelling technique for Lithuanian language. Viseme classification experiments across different languages are analysed to identify variety of viseme classification methods. Coarticulation control models are compared to deside which one should be used to define coarticulation of Lithuanian speech. Chapter 2 describes theoretical framework for Lithuanian speech animation. Translingual visual speech... [to full text] Informatics Engineering Kalbos animacija Kalbanti galva Fonema Vizema Speech animation Talking head Phoneme Viseme
4	On the development of an Interactive talking head system based on the use of PDE-based parametric surfaces Athanasopoulos, Michael, Ugail, Hassan, Gonzalez Castro, Gabriela January 2011 (has links) Yes / In this work we propose a talking head system for animating facial expressions using a template face generated from Partial Differen- tial Equations (PDEs). It uses a set of preconfigured curves to calculate an internal template surface face. This surface is then used to associate various facial features with a given 3D face object. Motion retargeting is then used to transfer the deformations in these areas from the template to the target object. The procedure is continued until all the expressions in the database are calculated and transferred to the target 3D human face object. Additionally the system interacts with the user using an artificial intelligence (AI) chatterbot to generate response from a given text. Speech and facial animation are synchronized using the Microsoft Speech API, where the response from the AI bot is converted to speech. Facial animation Speech animation PDE method Parametric surface representation Motion re-targetting Virtual interactive environments
5	Anglų kalbos vizemų pritaikymas lietuvių kalbos garsų animacijai / English visemes and Lithuanian phonemes mapping for animation Mažonavičiūtė, Ingrida 27 June 2008 (has links) Baigiamajame darbe tiriamas lietuvių kalbos garsų ir jų vaizdinės informacijos ryšys. Atliekama kalbančių galvų modelių animavimo algoritmų analizė, iškeliama jų problematika ir atsivelgiant į tai pasiūloma lietuvių kalbos sintetinimo metodika, kuri yra pagrįsta anglų kalbos vizemų naudojimu. Šiame darbe sukuriama 30 trimačių lietuvių kalbos vizemų, kurias vizualiai lyginant su standartinėmis anglų kalbos fonemų vizemomis, sudaroma lietuviškų fonemų ir angliškų vizemų atitikčių lentelė. Sudaryta lentelė naudojama lietuvių kalbos garso rinkmenai animuoti. / The connection of Lithuanian sounds and their visual aspect is analyzed. The thesis consists of talking head animation algorithms analysis, problematic topics. In reference it is proposed the idea, how to synthesize Lithuanian speech using English visemes. 30 three dimensional Lithuanian visemes are created. After visual comparison of 3D Lithuanian and standard English visemes, the table of Lithuanian phonemes and English visemes mapping is created. The table is used for animating the Lithuanian sound file. Informatics Engineering Kalbantis galvos modelis Vizema Fonema Talking head Speech animation technologies Viseme Phoneme
6	Modèle statistique de l'animation expressive de la parole et du rire pour un agent conversationnel animé / Data-driven expressive animation model of speech and laughter for an embodied conversational agent Ding, Yu 26 September 2014 (has links) Notre objectif est de simuler des comportements multimodaux expressifs pour les agents conversationnels animés ACA. Ceux-ci sont des entités dotées de capacités affectives et communicationnelles; ils ont souvent une apparence humaine. Quand un ACA parle ou rit, il est capable de montrer de façon autonome des comportements multimodaux pour enrichir et compléter son discours prononcé et transmettre des informations qualitatives telles que ses émotions. Notre recherche utilise les modèles d’apprentissage à partir données. Un modèle de génération de comportements multimodaux pour un personnage virtuel parlant avec des émotions différentes a été proposé ainsi qu’un modèle de simulation du comportement de rire sur un ACA. Notre objectif est d'étudier et de développer des générateurs d'animation pour simuler la parole expressive et le rire d’un ACA. En partant de la relation liant prosodie de la parole et comportements multimodaux, notre générateur d'animation prend en entrée les signaux audio prononcés et fournit en sortie des comportements multimodaux. Notre travail vise à utiliser un modèle statistique pour saisir la relation entre les signaux donnés en entrée et les signaux de sortie; puis cette relation est transformée en modèle d’animation 3D. Durant l'étape d’apprentissage, le modèle statistique est entrainé à partir de paramètres communs qui sont composés de paramètres d'entrée et de sortie. La relation entre les signaux d'entrée et de sortie peut être capturée et caractérisée par les paramètres du modèle statistique. Dans l'étape de synthèse, le modèle entrainé est utilisé pour produire des signaux de sortie (expressions faciale, mouvement de tête et du torse) à partir des signaux d'entrée (F0, énergie de la parole ou pseudo-phonème du rire). La relation apprise durant la phase d'apprentissage peut être rendue dans les signaux de sortie. Notre module proposé est basé sur des variantes des modèles de Markov cachés (HMM), appelées HMM contextuels. Ce modèle est capable de capturer la relation entre les mouvements multimodaux et de la parole (ou rire); puis cette relation est rendue par l’animation de l’ACA. / Our aim is to render expressive multimodal behaviors for Embodied conversational agents, ECAs. ECAs are entities endowed with communicative and emotional capabilities; they have human-like appearance. When an ECA is speaking or laughing, it is capable of displaying autonomously behaviors to enrich and complement the uttered speech and to convey qualitative information such as emotion. Our research lies in the data-driven approach. It focuses on generating the multimodal behaviors for a virtual character speaking with different emotions. It is also concerned with simulating laughing behavior on an ECA. Our aim is to study and to develop human-like animation generators for speaking and laughing ECA. On the basis of the relationship linking speech prosody and multimodal behaviors, our animation generator takes as input human uttered audio signals and output multimodal behaviors. Our work focuses on using statistical framework to capture the relationship between the input and the output signals; then this relationship is rendered into synthesized animation. In the training step, the statistical framework is trained based on joint features, which are composed of input and of output features. The relation between input and output signals can be captured and characterized by the parameters of the statistical framework. In the synthesis step, the trained framework is used to produce output signals (facial expression, head and torso movements) from input signals (F0, energy for speech or pseudo-phoneme of laughter). The relation captured in the training phase can be rendered into the output signals. Our proposed module is based on variants of Hidden Markov Model (HMM), called Contextual HMM. This model is capable of capturing the relationship between human motions and speech (or laughter); then such relationship is rendered into the synthesized animations. Modèle de Markov caché Agent conversationnel animé Synthèse d’animation Animation de la parole Animation du rire Hidden Markov model Embodied conversational agent Animation synthesis Speech animation Laughter animation

1

Page generated in 0.1107 seconds