• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 3
  • 1
  • 1
  • Tagged with
  • 6
  • 3
  • 3
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 1
  • 1
  • 1
  • 1
  • 1
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Real Time Speech Driven Face Animation / Talstyrd Ansiktsanimering i Realtid

Axelsson, Andreas, Björhäll, Erik January 2003 (has links)
<p>The goal of this project is to implement a system to analyse an audio signal containing speech, and produce a classifcation of lip shape categories (visemes) in order to synchronize the lips of a computer generated face with the speech. </p><p>The thesis describes the work to derive a method that maps speech to lip move- ments, on an animated face model, in real time. The method is implemented in C++ on the PC/Windows platform. The program reads speech from pre-recorded audio files and continuously performs spectral analysis of the speech. Neural networks are used to classify the speech into a sequence of phonemes, and the corresponding visemes are shown on the screen. </p><p>Some time delay between input speech and the visualization could not be avoided, but the overall visual impression is that sound and animation are synchronized.</p>
2

Real Time Speech Driven Face Animation / Talstyrd Ansiktsanimering i Realtid

Axelsson, Andreas, Björhäll, Erik January 2003 (has links)
The goal of this project is to implement a system to analyse an audio signal containing speech, and produce a classifcation of lip shape categories (visemes) in order to synchronize the lips of a computer generated face with the speech. The thesis describes the work to derive a method that maps speech to lip move- ments, on an animated face model, in real time. The method is implemented in C++ on the PC/Windows platform. The program reads speech from pre-recorded audio files and continuously performs spectral analysis of the speech. Neural networks are used to classify the speech into a sequence of phonemes, and the corresponding visemes are shown on the screen. Some time delay between input speech and the visualization could not be avoided, but the overall visual impression is that sound and animation are synchronized.
3

Lietuvių šnekos vizemų vizualizavimas / Lithuanian speech visemes visualization

Zailskas, Vytautas 15 June 2011 (has links)
Magistro baigiamajame darbe analizuojama lietuvių šnekos vizemų vizualizacijos problema, tiriami lietuvių kalbos vizemų vizualizacijos požymiai, galimybės koduoti SAMPA kodais analizė, analizuojami programinės įrangos kūrimo metodai ir algoritmai. Sukurtas algoritmas sprendžiantis lietuvių šnekos vizemų vizualizacijos problemą. Pasirinktas kompiuterinės grafikos tipas, tinkantis iškeltiems tikslams įvykdyti – vektorinė grafika. Sukurti du vektorių transformacijos metodai, išanalizuoti jų skirtumai ir praktinio panaudojimo galimybės. Sukurta programinė įranga įgalinanti vartotoją kurti vizemas, jas transformuoti ir derinti jų vaizdavimo trukmę įvairiais koeficientais ir vykdyti animaciją pagal pasirinktą transformacijos metodą. Sudarytos penkios vizemos ir dviejų lietuviškų žodžių animacijos, kurių pagalba atliktas tyrimas parodantis darbo ir metodų realizavimo kokybę bei pritaikomumą. / In this final Master's thesis work features of Lithuanian speech visemes visualization are analyzed. Possibility of coding with SAMPA codes, software methods and algorithms are inspected. Type of computer graphics is picked, which is suitable for software objectives – vector graphics. Two transformation methods for vector graphics are created and their differences and practical usability are analyzed. Software for visemes creation, transformation and tuning of their duration and duration of transformation between visemes is created and described. The main purpose of this software is to animate Lithuanian speech by the method selected. Five visemes for two Lithuanian words animation is created. Using these visemes research has been done which is showing the quality of realization and adaptability of this software.
4

Bimodal Automatic Speech Segmentation And Boundary Refinement Techniques

Akdemir, Eren 01 March 2010 (has links) (PDF)
Automatic segmentation of speech is compulsory for building large speech databases to be used in speech processing applications. This study proposes a bimodal automatic speech segmentation system that uses either articulator motion information (AMI) or visual information obtained by a camera in collaboration with auditory information. The presence of visual modality is shown to be very beneficial in speech recognition applications, improving the performance and noise robustness of those systems. In this dissertation a significant increase in the performance of the automatic speech segmentation system is achieved by using a bimodal approach. Automatic speech segmentation systems have a tradeoff between precision and resulting number of gross errors. Boundary refinement techniques are used in order to increase precision of these systems without decreasing the system performance. Two novel boundary refinement techniques are proposed in this thesis / a hidden Markov model (HMM) based fine tuning system and an inverse filtering based fine tuning system. The segment boundaries obtained by the bimodal speech segmentation system are improved further by using these techniques. To fulfill these goals, a complete two-stage automatic speech segmentation system is produced and tested in two different databases. A phonetically rich Turkish audiovisual speech database, that contains acoustic data and camera recordings of 1600 Turkish sentences uttered by a male speaker, is build from scratch in order to be used in the experiments. The visual features of the recordings are extracted and manual phonetic alignment of the database is done to be used as a ground truth for the performance tests of the automatic speech segmentation systems.
5

Audiovisual integration for perception of speech produced by nonnative speakers

Yi, Han-Gyol 12 September 2014 (has links)
Speech often occurs in challenging listening environments, such as masking noise. Visual cues have been found to enhance speech intelligibility in noise. Although the facilitatory role of audiovisual integration for perception of speech has been established in native speech, it is relatively unclear whether it also holds true for speech produced by nonnative speakers. Native listeners were presented with English sentences produced by native English and native Korean speakers. The sentences were in either audio-only or audiovisual conditions. Korean speakers were rated as more accented in audiovisual than in the audio-only condition. Visual cues enhanced speech intelligibility in noise for native English speech but less so for nonnative speech. Reduced intelligibility of audiovisual nonnative speech was associated with implicit Asian-Foreign association, suggesting that listener-related factors partially influence the efficiency of audiovisual integration for perception of speech produced by nonnative speakers. / text
6

Síntesis Audiovisual Realista Personalizable

Melenchón Maldonado, Javier 13 July 2007 (has links)
Es presenta un esquema únic per a la síntesi i anàlisi audiovisual personalitzable realista de seqüències audiovisuals de cares parlants i seqüències visuals de llengua de signes en àmbit domèstic. En el primer cas, amb animació totalment sincronitzada a través d'una font de text o veu; en el segon, utilitzant la tècnica de lletrejar paraules mitjançant la ma. Les seves possibilitats de personalització faciliten la creació de seqüències audiovisuals per part d'usuaris no experts. Les aplicacions possibles d'aquest esquema de síntesis comprenen des de la creació de personatges virtuals realistes per interacció natural o vídeo jocs fins vídeo conferència des de molt baix ample de banda i telefonia visual per a les persones amb problemes d'oïda, passant per oferir ajuda a la pronunciació i la comunicació a aquest mateix col·lectiu. El sistema permet processar seqüències llargues amb un consum de recursos molt reduït, sobre tot, en el referent a l'emmagatzematge, gràcies al desenvolupament d'un nou procediment de càlcul incremental per a la descomposició en valors singulars amb actualització de la informació mitja. Aquest procediment es complementa amb altres tres: el decremental, el de partició i el de composició. / Se presenta un esquema único para la síntesis y análisis audiovisual personalizable realista de secuencias audiovisuales de caras parlantes y secuencias visuales de lengua de signos en entorno doméstico. En el primer caso, con animación totalmente sincronizada a través de una fuente de texto o voz; en el segundo, utilizando la técnica de deletreo de palabras mediante la mano. Sus posibilidades de personalización facilitan la creación de secuencias audiovisuales por parte de usuarios no expertos. Las aplicaciones posibles de este esquema de síntesis comprenden desde la creación de personajes virtuales realistas para interacción natural o vídeo juegos hasta vídeo conferencia de muy bajo ancho de banda y telefonía visual para las personas con problemas de oído, pasando por ofrecer ayuda en la pronunciación y la comunicación a este mismo colectivo. El sistema permite procesar secuencias largas con un consumo de recursos muy reducido gracias al desarrollo de un nuevo procedimiento de cálculo incremental para la descomposición en valores singulares con actualización de la información media. / A shared framework for realistic and personalizable audiovisual synthesis and analysis of audiovisual sequences of talking heads and visual sequences of sign language is presented in a domestic environment. The former has full synchronized animation using a text or auditory source of information; the latter consists in finger spelling. Their personalization capabilities ease the creation of audiovisual sequences by non expert users. The applications range from realistic virtual avatars for natural interaction or videogames to low bandwidth videoconference and visual telephony for the hard of hearing, including help to speech therapists. Long sequences can be processed with reduced resources, specially storing ones. This is allowed thanks to the proposed scheme for the incremental singular value decomposition with mean preservation. This scheme is complemented with another three: the decremental, the split and the composed ones.

Page generated in 0.0383 seconds