Global ETD Search

1	Towards Man-Machine Interfaces: Combining Top-down Constraints with Bottom-up Learning in Facial Analysis Kumar, Vinay P. 01 September 2002 (has links) This thesis proposes a methodology for the design of man-machine interfaces by combining top-down and bottom-up processes in vision. From a computational perspective, we propose that the scientific-cognitive question of combining top-down and bottom-up knowledge is similar to the engineering question of labeling a training set in a supervised learning problem. We investigate these questions in the realm of facial analysis. We propose the use of a linear morphable model (LMM) for representing top-down structure and use it to model various facial variations such as mouth shapes and expression, the pose of faces and visual speech (visemes). We apply a supervised learning method based on support vector machine (SVM) regression for estimating the parameters of LMMs directly from pixel-based representations of faces. We combine these methods for designing new, more self-contained systems for recognizing facial expressions, estimating facial pose and for recognizing visemes. AI Facial Expression Recognition Pose Estimation Viseme Recognition SVM
2	Lietuvių šnekos vizemų vizualizavimas / Lithuanian speech visemes visualization Zailskas, Vytautas 15 June 2011 (has links) Magistro baigiamajame darbe analizuojama lietuvių šnekos vizemų vizualizacijos problema, tiriami lietuvių kalbos vizemų vizualizacijos požymiai, galimybės koduoti SAMPA kodais analizė, analizuojami programinės įrangos kūrimo metodai ir algoritmai. Sukurtas algoritmas sprendžiantis lietuvių šnekos vizemų vizualizacijos problemą. Pasirinktas kompiuterinės grafikos tipas, tinkantis iškeltiems tikslams įvykdyti – vektorinė grafika. Sukurti du vektorių transformacijos metodai, išanalizuoti jų skirtumai ir praktinio panaudojimo galimybės. Sukurta programinė įranga įgalinanti vartotoją kurti vizemas, jas transformuoti ir derinti jų vaizdavimo trukmę įvairiais koeficientais ir vykdyti animaciją pagal pasirinktą transformacijos metodą. Sudarytos penkios vizemos ir dviejų lietuviškų žodžių animacijos, kurių pagalba atliktas tyrimas parodantis darbo ir metodų realizavimo kokybę bei pritaikomumą. / In this final Master's thesis work features of Lithuanian speech visemes visualization are analyzed. Possibility of coding with SAMPA codes, software methods and algorithms are inspected. Type of computer graphics is picked, which is suitable for software objectives – vector graphics. Two transformation methods for vector graphics are created and their differences and practical usability are analyzed. Software for visemes creation, transformation and tuning of their duration and duration of transformation between visemes is created and described. The main purpose of this software is to animate Lithuanian speech by the method selected. Five visemes for two Lithuanian words animation is created. Using these visemes research has been done which is showing the quality of realization and adaptability of this software. Informatics Lietuvių šneka Vizema Vizemos Vizualizavimas Vektorių transformacija Lithuanian speech Visemes Viseme Visualization Vector transformation
3	Banco para avaliar linguagem, controlando: univocidade de figuras, familiaridade e decifrabilidade de escrita; cifrabilidade de fala ouvida; e legibilidade, audibilizabilidade e cifrabilidade de fala vista / Language assessment sourcebook with control upon degree of picture univocity, print recognizability and decodibility, audible speech encodibility, and visible speech legibility, audibilizability and encodibility Jacote, Andréa 24 April 2015 (has links) Esta dissertação de mestrado apresenta um banco de figuras e palavras. O banco objetiva servir para a aumentar a validade e precisão dos instrumentos de avaliação, bem como a eficácia dos materiais instrucionais para desenvolvimento de linguagem. Este banco contém 971 entradas lexicais. Cada entrada contém uma figura e seu correspondente nome escrito. A figura é analisada em termos de grau de univocidade (grau de concordância na nomeação). O nome da figura é analisado separadamente em três formas: palavra escrita visível, palavra falada audível, e palavra falada visível (lida orofacialmente). Palavras escritas visíveis são compostas de grafemas. São analisadas em termos de seu grau de familiaridade e reconhecibilidade (grau em que pode ser lida via rota lexical) e decodificabilidade (grau em que pode ser lida pela rota perilexical ou fonológica). Palavras ouvidas audíveis são compostas de fonemas. São analisadas em termos do grau de cifrabilidade (grau de facilidade com que podem ser escritas via rota perilexical). Palavras faladas vistas são compostas de fanerolaliemas. São analisadas em termos do grau de legibilidade orofacial (grau em que podem ser compreendidas apenas pela visão), audibilizabilidade (grau em que a imagem auditiva dos fonemas pode ser evocada por fanerolaliemas durante a leitura orofacial visual), e cifrabilidade (grau de facilidade com que podem ser escritas via rota perilexical). O banco é composto de 971 entradas lexicais, cada qual composta de uma figura (à esquerda) e de vários dados pertinentes ao seu nome correspondente (à direita). O lado direito da entrada é composto de seis campos. O Campo 1 fornece o o nome da figura escrito em dois alfabetos: alfabeto romano e alfabeto fonético internacional. Ele também fornece a categoria semântica à qual pertence a palavra. O Campo 2 fornece o número da figura (para indexar todas as 971 figuras do banco). O Campo 3 fornece a univocidade da figura numa escala de 0-100 pontos separadamente para crianças de 2, 3, 4, 5, 6 7-10 anos de idade, bem como para adultos. O Campo 4 fornece a palavra escrita visível tal como analisada em termos de seu grau de familiaridade ou reconhecibilidade (grau em que pode ser lida via rota lexical) numa escala de 1-9 pontos, separadamente para crianças de 5º ano, 4º ano, 3º ano, 2º ano, e 1º ano. Nessa escala, 5 corresponde à média, 6 a 1 erro-padrão (EP) acima da média, 7 a 2 EP acima da média e assim por diante até 9; ao passo que 4 corresponde 1 EP abaixo da média, 3 a 2 EP abaixo da média, e assim por diante até 1. O Campo 5 é composto de quatro linhas, cada qual dividida em quatro colunas. A Linha 1 fornece o grau de decifrabilidade (grau com que pode ser lida pela rota perilexical) da palavra escrita visível, numa escala de 0-1. A Linha 2 fornece o grau de cifrabilidade da palavra ouvida (grau com que pode ser escrita pela rota perilexical), numa escala de 0-1. A Linha 3 fornece o grau de audibilizabilidade da palavra falada lida orofacialmente (grau com que sequência de fanerolaliemas pode ser convertida em sequência de fonemas), numa escala de 0-1. A Linha 3 fornece o grau de cifrabilidade da palavra falada lida orofacialmente (grau com que sequência de fanerolaliemas pode ser convertida em sequência de grafemas), numa escala de 0-1. Cada palavra é dividida em suas colunas. cada coluna fornece os dados referentes à linha em questão em uma de quatro formas diferentes. Nas Colunas 1 e 2, dados consistem na média das razões independente da incidência. Nas Colunas 3 e 4, dados consistem na média das razões ponderada por incidência diferencial. Nas Colunas 1 e 3 os dados consistem na média das razões independentemente da tonicidade da fala (seja ouvida ou vista) na pronúncia. Nas Colunas 2 e 4, os dados consistem na média das razões ponderada pela tonicidade diferencial da fala (seja ouvida ou vista) na pronúncia. Por exemplo, a Linha 1 fornece o grau de decifrabilidade grafema-fonema da palavra escrita visível. Na Coluna 1 decoficabilidade é calculada como mera média de razões independente da incidência ou tonicidade. Na Coluna 2 decodificabilidade é calculada como média das razões independente da incidência mas ponderasa pela tonicidade. Na Coluna 3 decodificabilidade é calculada como média de razões ponderadas em termos de incidência mas independente de tonicidade. Na Coluna 4 decodificabilidade é calculada como média de razões ponderadas em termos de incidência e de tonicidade. O Campo 6 fornece o grau de legibilidade orofacial da fala vista, numa escala de 0-1. O grau de legibilidade orofacial é apresentado em quatro formas. Nas Colunas 1 e 2 ela se encontra calculada segundo o modelo Dória; nas Colunas 3 e 4 ela se encontra calculada segundo o modelo Fonético-Articulatório. Nas Colunas 1 e 3 ela é calculada independentemente da tonicidade da pronúncia; nas Colunas 2 e 4 ela é calculada de modo ponderado pela tonicidade diferencial da pronúncia / This master\'s thesis presents a new sourcebook aimed at increasing the validity and precision of language assessment tools, as well as the efficacy of instructional materials for language development. The sourcebook contains 971 lexical entries. Each entry contains a picture and its corresponding written name. The picture is analyzed in terms of its degree of univocity (i.e., picture naming agreement). The picture name is analyzed separately in three forms: visual written word, auditory spoken word, and visual spoken word (i.e., speechreading). Visual written word is made of graphemes. It is analyzed in terms of its degree of both: familiarity or recognizability (i.e., the degree to which it is suitable to be read via lexical reading route) and decodibility (i.e., the degree to which it is suitable to be read via perilexical reading route). Auditory spoken word is made of phonemes. It is analyzed in terms of its degree of encodibility (i.e., the degree to which it may be suitable for writing or spelling via perilexical spelling route). Visual spoken word is made of visemes. It is analyzed in terms of its degree of: speechreadability (i.e., the degree to which it may be understood via visual speechreading), audibilizability (i.e., the degree to which the auditory imagery of phonemes can be evoked by mouthshapes or visemes during speechreading), and encodibility (i.e., the degree to which it is suitable to be written or spelled correctly via perilexical route). The sourcebook is made of 971 lexical entries. Each entry is made of a picture (on the left) and several data pertaining to its corresponding name (on the right). The right side of the entry is made of six areas. The first area provides the picture name as it is written in both alphabets: the Roman alphabet (orthographic form) and the International Phonetic Alphabet. It also provides the semantic category to which the word belongs. The second area provides the picture number (for indexing all the 971 pictures of the sourcebook). The third area provides the picture univocity in a 0-100 scale for children aged: 2 years, 3 years, 4 years, 5 years, 6 years, 7 to 10 years, as well as for adults. The fourth area provides the visual written word as it is analyzed in terms of its degree of familiarity or recognizability (i.e., the degree to which the written word is suitable to be read via lexical reading route) in a 1-9 point scale, for children from 5th grade, 4th grade, 3rd grade, 2nd grade, and 1st grade. In such a scale, 5 corresponds to the mean, 6 is the mean plus 1 standard error, 7 is the mean plus 2 standard errors and so forth until 9, whereas 4 corresponds to the mean minus 1 standard error, 3 corresponds to the mean minus 2 standard errors and so forth until 1, which corresponds to the mean minus 4 standar erros. The fifth area is made of four lines. Each line is divided into four columns. The first line provides the visual written word degree of decodibility (i.e., the degree to which it is suitable to be read via perilexical reading route) in a 0-1 scale. The second line provides the auditory spoken word degree of encodibility (i.e., the degree to which it may be suitable for writing or spelling via perilexical spelling route) in a 0-1 scale. The third line provides the visual spoken word degree of audibilizability (i.e., the degree to which the auditory imagery of phonemes can be evoked by mouthshapes or visemes during speechreading) in a 0-1 scale. The fouth line provides the visual spoken word degree of encodibility (i.e., the degree to which it is suitable to be written or spelled correctly via perilexical route) in a 0-1 scale. Each line is divided into four columns. Each column presents the data pertaining to the line in question in 1 of 4 different forms. In the first and second columns the data consist of the mean of the ratios regardless of incidence. In the third and fourth columns the data consist of the mean of the ratios weighted by differencial incidence. In the first and third columns the data consist of the mean of the ratios regardless of tonicity of speech (either auditory or visual) in pronunciation. In the second and fourth columns the data consist of the mean of the ratios weighted by differencial tonicity of speech (either auditory or visual) in pronunciation. For instance the first line provides the visual written word degree of decodibility (i.e., grapheme to phoneme decoding). In the first column decodibility is calculated as a mere mean of the ratios regardless of either incidence or tonicity. In the second column decodibility is calculated as a mean of the ratios regardless of incidence but weighted in terms of tonicity. In the third column decodibility is calculated as a mean of the ratios weighted in terms of incidence but regardless of tonicity. In the fourth column it is calculated as a mean of the ratios weighted in terms of both incidence and tonicity. The sixth area provides the visual spoken word degree of speechreadability (i.e., the degree to which it may be understood via visual speechreading) in a 0-1 scale. The speechreadability is presented in 1 of 4 different forms. In the first and second columns, the speechreadability is calculated according to Doria\'s model. In the third and fourth columns it is calculated according to a phonetic model. In the first column and third columns it is calculated regardless of tonicity in pronunciation. In the second and fourth columns it is calculated in a way that is weighted by the differencial tonicity in pronunciation Cifrabilidade Decifrabilidade Decoding Encoding Familiaridade Familiarity Fanerolaliema Fonema Grafema Grapheme Ortografia Ortography Phoneme Univocidade Univocity Viseme
4	Banco para avaliar linguagem, controlando: univocidade de figuras, familiaridade e decifrabilidade de escrita; cifrabilidade de fala ouvida; e legibilidade, audibilizabilidade e cifrabilidade de fala vista / Language assessment sourcebook with control upon degree of picture univocity, print recognizability and decodibility, audible speech encodibility, and visible speech legibility, audibilizability and encodibility Andréa Jacote 24 April 2015 (has links) Esta dissertação de mestrado apresenta um banco de figuras e palavras. O banco objetiva servir para a aumentar a validade e precisão dos instrumentos de avaliação, bem como a eficácia dos materiais instrucionais para desenvolvimento de linguagem. Este banco contém 971 entradas lexicais. Cada entrada contém uma figura e seu correspondente nome escrito. A figura é analisada em termos de grau de univocidade (grau de concordância na nomeação). O nome da figura é analisado separadamente em três formas: palavra escrita visível, palavra falada audível, e palavra falada visível (lida orofacialmente). Palavras escritas visíveis são compostas de grafemas. São analisadas em termos de seu grau de familiaridade e reconhecibilidade (grau em que pode ser lida via rota lexical) e decodificabilidade (grau em que pode ser lida pela rota perilexical ou fonológica). Palavras ouvidas audíveis são compostas de fonemas. São analisadas em termos do grau de cifrabilidade (grau de facilidade com que podem ser escritas via rota perilexical). Palavras faladas vistas são compostas de fanerolaliemas. São analisadas em termos do grau de legibilidade orofacial (grau em que podem ser compreendidas apenas pela visão), audibilizabilidade (grau em que a imagem auditiva dos fonemas pode ser evocada por fanerolaliemas durante a leitura orofacial visual), e cifrabilidade (grau de facilidade com que podem ser escritas via rota perilexical). O banco é composto de 971 entradas lexicais, cada qual composta de uma figura (à esquerda) e de vários dados pertinentes ao seu nome correspondente (à direita). O lado direito da entrada é composto de seis campos. O Campo 1 fornece o o nome da figura escrito em dois alfabetos: alfabeto romano e alfabeto fonético internacional. Ele também fornece a categoria semântica à qual pertence a palavra. O Campo 2 fornece o número da figura (para indexar todas as 971 figuras do banco). O Campo 3 fornece a univocidade da figura numa escala de 0-100 pontos separadamente para crianças de 2, 3, 4, 5, 6 7-10 anos de idade, bem como para adultos. O Campo 4 fornece a palavra escrita visível tal como analisada em termos de seu grau de familiaridade ou reconhecibilidade (grau em que pode ser lida via rota lexical) numa escala de 1-9 pontos, separadamente para crianças de 5º ano, 4º ano, 3º ano, 2º ano, e 1º ano. Nessa escala, 5 corresponde à média, 6 a 1 erro-padrão (EP) acima da média, 7 a 2 EP acima da média e assim por diante até 9; ao passo que 4 corresponde 1 EP abaixo da média, 3 a 2 EP abaixo da média, e assim por diante até 1. O Campo 5 é composto de quatro linhas, cada qual dividida em quatro colunas. A Linha 1 fornece o grau de decifrabilidade (grau com que pode ser lida pela rota perilexical) da palavra escrita visível, numa escala de 0-1. A Linha 2 fornece o grau de cifrabilidade da palavra ouvida (grau com que pode ser escrita pela rota perilexical), numa escala de 0-1. A Linha 3 fornece o grau de audibilizabilidade da palavra falada lida orofacialmente (grau com que sequência de fanerolaliemas pode ser convertida em sequência de fonemas), numa escala de 0-1. A Linha 3 fornece o grau de cifrabilidade da palavra falada lida orofacialmente (grau com que sequência de fanerolaliemas pode ser convertida em sequência de grafemas), numa escala de 0-1. Cada palavra é dividida em suas colunas. cada coluna fornece os dados referentes à linha em questão em uma de quatro formas diferentes. Nas Colunas 1 e 2, dados consistem na média das razões independente da incidência. Nas Colunas 3 e 4, dados consistem na média das razões ponderada por incidência diferencial. Nas Colunas 1 e 3 os dados consistem na média das razões independentemente da tonicidade da fala (seja ouvida ou vista) na pronúncia. Nas Colunas 2 e 4, os dados consistem na média das razões ponderada pela tonicidade diferencial da fala (seja ouvida ou vista) na pronúncia. Por exemplo, a Linha 1 fornece o grau de decifrabilidade grafema-fonema da palavra escrita visível. Na Coluna 1 decoficabilidade é calculada como mera média de razões independente da incidência ou tonicidade. Na Coluna 2 decodificabilidade é calculada como média das razões independente da incidência mas ponderasa pela tonicidade. Na Coluna 3 decodificabilidade é calculada como média de razões ponderadas em termos de incidência mas independente de tonicidade. Na Coluna 4 decodificabilidade é calculada como média de razões ponderadas em termos de incidência e de tonicidade. O Campo 6 fornece o grau de legibilidade orofacial da fala vista, numa escala de 0-1. O grau de legibilidade orofacial é apresentado em quatro formas. Nas Colunas 1 e 2 ela se encontra calculada segundo o modelo Dória; nas Colunas 3 e 4 ela se encontra calculada segundo o modelo Fonético-Articulatório. Nas Colunas 1 e 3 ela é calculada independentemente da tonicidade da pronúncia; nas Colunas 2 e 4 ela é calculada de modo ponderado pela tonicidade diferencial da pronúncia / This master\'s thesis presents a new sourcebook aimed at increasing the validity and precision of language assessment tools, as well as the efficacy of instructional materials for language development. The sourcebook contains 971 lexical entries. Each entry contains a picture and its corresponding written name. The picture is analyzed in terms of its degree of univocity (i.e., picture naming agreement). The picture name is analyzed separately in three forms: visual written word, auditory spoken word, and visual spoken word (i.e., speechreading). Visual written word is made of graphemes. It is analyzed in terms of its degree of both: familiarity or recognizability (i.e., the degree to which it is suitable to be read via lexical reading route) and decodibility (i.e., the degree to which it is suitable to be read via perilexical reading route). Auditory spoken word is made of phonemes. It is analyzed in terms of its degree of encodibility (i.e., the degree to which it may be suitable for writing or spelling via perilexical spelling route). Visual spoken word is made of visemes. It is analyzed in terms of its degree of: speechreadability (i.e., the degree to which it may be understood via visual speechreading), audibilizability (i.e., the degree to which the auditory imagery of phonemes can be evoked by mouthshapes or visemes during speechreading), and encodibility (i.e., the degree to which it is suitable to be written or spelled correctly via perilexical route). The sourcebook is made of 971 lexical entries. Each entry is made of a picture (on the left) and several data pertaining to its corresponding name (on the right). The right side of the entry is made of six areas. The first area provides the picture name as it is written in both alphabets: the Roman alphabet (orthographic form) and the International Phonetic Alphabet. It also provides the semantic category to which the word belongs. The second area provides the picture number (for indexing all the 971 pictures of the sourcebook). The third area provides the picture univocity in a 0-100 scale for children aged: 2 years, 3 years, 4 years, 5 years, 6 years, 7 to 10 years, as well as for adults. The fourth area provides the visual written word as it is analyzed in terms of its degree of familiarity or recognizability (i.e., the degree to which the written word is suitable to be read via lexical reading route) in a 1-9 point scale, for children from 5th grade, 4th grade, 3rd grade, 2nd grade, and 1st grade. In such a scale, 5 corresponds to the mean, 6 is the mean plus 1 standard error, 7 is the mean plus 2 standard errors and so forth until 9, whereas 4 corresponds to the mean minus 1 standard error, 3 corresponds to the mean minus 2 standard errors and so forth until 1, which corresponds to the mean minus 4 standar erros. The fifth area is made of four lines. Each line is divided into four columns. The first line provides the visual written word degree of decodibility (i.e., the degree to which it is suitable to be read via perilexical reading route) in a 0-1 scale. The second line provides the auditory spoken word degree of encodibility (i.e., the degree to which it may be suitable for writing or spelling via perilexical spelling route) in a 0-1 scale. The third line provides the visual spoken word degree of audibilizability (i.e., the degree to which the auditory imagery of phonemes can be evoked by mouthshapes or visemes during speechreading) in a 0-1 scale. The fouth line provides the visual spoken word degree of encodibility (i.e., the degree to which it is suitable to be written or spelled correctly via perilexical route) in a 0-1 scale. Each line is divided into four columns. Each column presents the data pertaining to the line in question in 1 of 4 different forms. In the first and second columns the data consist of the mean of the ratios regardless of incidence. In the third and fourth columns the data consist of the mean of the ratios weighted by differencial incidence. In the first and third columns the data consist of the mean of the ratios regardless of tonicity of speech (either auditory or visual) in pronunciation. In the second and fourth columns the data consist of the mean of the ratios weighted by differencial tonicity of speech (either auditory or visual) in pronunciation. For instance the first line provides the visual written word degree of decodibility (i.e., grapheme to phoneme decoding). In the first column decodibility is calculated as a mere mean of the ratios regardless of either incidence or tonicity. In the second column decodibility is calculated as a mean of the ratios regardless of incidence but weighted in terms of tonicity. In the third column decodibility is calculated as a mean of the ratios weighted in terms of incidence but regardless of tonicity. In the fourth column it is calculated as a mean of the ratios weighted in terms of both incidence and tonicity. The sixth area provides the visual spoken word degree of speechreadability (i.e., the degree to which it may be understood via visual speechreading) in a 0-1 scale. The speechreadability is presented in 1 of 4 different forms. In the first and second columns, the speechreadability is calculated according to Doria\'s model. In the third and fourth columns it is calculated according to a phonetic model. In the first column and third columns it is calculated regardless of tonicity in pronunciation. In the second and fourth columns it is calculated in a way that is weighted by the differencial tonicity in pronunciation Cifrabilidade Decifrabilidade Familiaridade Fanerolaliema Fonema Grafema Ortografia Univocidade Decoding Encoding Familiarity Grapheme Ortography Phoneme Univocity Viseme
5	Lietuvių kalbos animavimo technologija taikant trimatį veido modelį / Lithuanian speech animation technology for 3D facial model Mažonavičiūtė, Ingrida 18 February 2013 (has links) Kalbos animacija plačiai naudojama technikos įrenginiuose siekiant kurtiesiems, vaikams, vidutinio ir vyresnio amžiaus žmonėms sudaryti vienodas bendravimo galimybes. Žmonės yra labai jautrūs veido išvaizdos pokyčiams, todėl kalbos animavimas yra sudėtingas procesas, kurio metu žmogaus kalboje atpažinta akustinė informacija (fonemos) yra vizualizuojama naudojant specialiai sumodeliuotas veido išraiškas vadinamas vizemomis. Didžiausią įtaką kalbos animacijos tikroviškumui turi teisingas fonemas atitinkančių vizemų identifikavimas, modeliavimas ir jų išrikiavimas laiko juostoje. Tačiau, norint užtikrinti kalbos animacijos natūralumą, būtina papildomai išnalizuoti vizemų įtaką kaimyninėms fonemoms ir atsižvelgiant į animuojamos kalbos fonetines savybes sukurti koartikuliacijos valdymo modelį. Kiekvienos kalbos fonetika skiriasi, todėl kitai vienai kalbai sukurta animavimo sistema nėra tiesiogiai tinkama kitai kalbai animuoti. Kalbos animavimo karkasas, kuriame realizuojama Lietuvių kalbai skirta animavimo technologija, turi būti sukurta lietuvių kalbai vizualizuoti. Darbą sudaro įvadas, trys pagrindiniai skyriai, bendrosios išvados, literatūros sąrašas, publikacijų sąrašas. Pirmame skyriuje Skyriuje analizuojamos pasaulyje naudojamos kalbos animavimo technologijos. Kalbos signalas yra ir girdimas, ir matomas, todėl jos animacija yra sudėtinis procesas priklausantis nuo pasirinktos veido modeliavimo metodikos, kalbos signalo tipo, ir koartikuliacijos valdymo modelio. Antrajame... [toliau žr. visą tekstą] / Speech animation is widely used in technical devices to allow the growing number of hearing impaired persons, children, middle-aged and elderly equal participation in communication. Speech animation systems (“Talking heads”) are basically driven by speech phonetics and their visual representation – visemes. Acuraccy of the chosen speech recognition engine, naturally looking visemes, phoneme to viseme mapping and coarticulation control model considerably influence the quality of animated speech. Speech animation is strongly related with language phonetics, so new“Talking heads” should be created to animate different languages. Framework suitable to animate Lithuanian speech, which includes two new models that help to improve intelligibility of animated Lithuanian speech is used to create Lithuanian „Talking head” „LIT”. The dissertation consists of Introduction, three main chapters and general conclusions. Chapter 1 provides the analysis of the existing speech animation technologies. Different facial modelling techniques are analysed to define the most suitable 3D „Talking head” modelling technique for Lithuanian language. Viseme classification experiments across different languages are analysed to identify variety of viseme classification methods. Coarticulation control models are compared to deside which one should be used to define coarticulation of Lithuanian speech. Chapter 2 describes theoretical framework for Lithuanian speech animation. Translingual visual speech... [to full text] Informatics Engineering Kalbos animacija Kalbanti galva Fonema Vizema Speech animation Talking head Phoneme Viseme
6	Anglų kalbos vizemų pritaikymas lietuvių kalbos garsų animacijai / English visemes and Lithuanian phonemes mapping for animation Mažonavičiūtė, Ingrida 27 June 2008 (has links) Baigiamajame darbe tiriamas lietuvių kalbos garsų ir jų vaizdinės informacijos ryšys. Atliekama kalbančių galvų modelių animavimo algoritmų analizė, iškeliama jų problematika ir atsivelgiant į tai pasiūloma lietuvių kalbos sintetinimo metodika, kuri yra pagrįsta anglų kalbos vizemų naudojimu. Šiame darbe sukuriama 30 trimačių lietuvių kalbos vizemų, kurias vizualiai lyginant su standartinėmis anglų kalbos fonemų vizemomis, sudaroma lietuviškų fonemų ir angliškų vizemų atitikčių lentelė. Sudaryta lentelė naudojama lietuvių kalbos garso rinkmenai animuoti. / The connection of Lithuanian sounds and their visual aspect is analyzed. The thesis consists of talking head animation algorithms analysis, problematic topics. In reference it is proposed the idea, how to synthesize Lithuanian speech using English visemes. 30 three dimensional Lithuanian visemes are created. After visual comparison of 3D Lithuanian and standard English visemes, the table of Lithuanian phonemes and English visemes mapping is created. The table is used for animating the Lithuanian sound file. Informatics Engineering Kalbantis galvos modelis Vizema Fonema Talking head Speech animation technologies Viseme Phoneme

1

Page generated in 0.0389 seconds