Spelling suggestions: "subject:"speechrecognition"" "subject:"breedsrecognition""
651 |
Aplicação de um robô humanoide autônomo por meio de reconhecimento de imagem e voz em sessões pedagógicas interativas / Application of an autonomous humanoid robot by image and voice recognition in interactive pedagogical sessionsTozadore, Daniel Carnieto 03 March 2016 (has links)
A Robótica Educacional consiste na utilização de robôs para aplicação prática dos conteúdos teóricos discutidos em sala de aula. Porém, os robôs mais usados apresentam uma carência de interação com os usuários, a qual pode ser melhorada com a inserção de robôs humanoides. Esta dissertação tem como objetivo a combinação de técnicas de visão computacional, robótica social e reconhecimento e síntese de fala para a construção de um sistema interativo que auxilie em sessões pedagógicas por meio de um robô humanoide. Diferentes conteúdos podem ser abordados pelos robôs de forma autônoma. Sua aplicação visa o uso do sistema como ferramenta de auxílio no ensino de matemática para crianças. Para uma primeira abordagem, o sistema foi treinado para interagir com crianças e reconhecer figuras geométricas 3D. O esquema proposto é baseado em módulos, no qual cada módulo é responsável por uma função específica e contém um grupo de funcionalidades. No total são 4 módulos: Módulo Central, Módulo de Diálogo, Módulo de Visão e Módulo Motor. O robô escolhido é o humanoide NAO. Para visão computacional, foram comparados a rede LEGION e o sistema VOCUS2 para detecção de objetos e SVM e MLP para classificação de imagens. O reconhecedor de fala Google Speech Recognition e o sintetizador de voz do NAOqi API são empregados para interações sonoras. Também foi conduzido um estudo de interação, por meio da técnica de Mágico-de-Oz, para analisar o comportamento das crianças e adequar os métodos para melhores resultados da aplicação. Testes do sistema completo mostraram que pequenas calibrações são suficientes para uma sessão de interação com poucos erros. Os resultados mostraram que crianças que tiveram contato com uma maior interatividade com o robô se sentiram mais engajadas e confortáveis nas interações, tanto nos experimentos quanto no estudo em casa para as próximas sessões, comparadas às crianças que tiveram contato com menor nível de interatividade. Intercalar comportamentos desafiadores e comportamentos incentivadores do robô trouxeram melhores resultados na interação com as crianças do que um comportamento constante. / Educational Robotics is a growing area that uses robots to apply theoretical concepts discussed in class. However, robots usually present a lack of interaction with users that can be improved with humanoid robots. This dissertation presents a project that combines computer vision techniques, social robotics and speech synthesis and recognition to build an interactive system which leads educational sessions through a humanoid robot. This system can be trained with different content to be addressed autonomously to users by a robot. Its application covers the use of the system as a tool in the mathematics teaching for children. For a first approach, the system has been trained to interact with children and recognize 3D geometric figures. The proposed scheme is based on modules, wherein each module is responsible for a specific function and includes a group of features for this purpose. In total there are 4 modules: Central Module, Dialog Module, Vision Module and Motor Module. The chosen robot was the humanoid NAO. For the Vision Module, LEGION network and VOCUS2 system were compared for object detection and SVM and MLP for image classification. The Google Speech Recognition speech recognizer and Voice Synthesizer Naoqi API are used for sound interactions. An interaction study was conducted by Wizard-of-Oz technique to analyze the behavior of children and adapt the methods for better application results. Full system testing showed that small calibrations are sufficient for an interactive session with few errors. Children who had experienced greater interaction degrees from the robot felt more engaged and comfortable during interactions, both in the experiments and studying at home for the next sessions, compared to children who had contact with a lower level of interactivity. Interim challenging behaviors and support behaviors brought better results in interaction than a constant behavior.
|
652 |
Um framework para desenvolvimento de interfaces multimodais em aplicações de computação ubíqua / A framework for multimodal interfaces development in ubiquitous computing applicationsValter dos Reis Inacio Junior 26 April 2007 (has links)
Interfaces multimodais processam vários tipos de entrada do usuário, tais como voz, gestos e interação com caneta, de uma maneira combinada e coordenada com a saída multimídia do sistema. Aplicações que suportam a multimodalidade provêem um modo mais natural e flexível para a execução de tarefas em computadores, uma vez que permitem que usuários com diferentes níveis de habilidades escolham o modo de interação que melhor se adequa às suas necessidades. O uso de interfaces que fogem do estilo convencional de interação baseado em teclado e mouse vai de encontro ao conceito de computação ubíqua, que tem se estabelecido como uma área de pesquisa que estuda os aspectos tecnológicos e sociais decorrentes da integração de sistemas e dispositivos computacionais à ambientes. Nesse contexto, o trabalho aqui reportado visou investigar a implementação de interfaces multimodais em aplicações de computação ubíqua, por meio da construção de um framework de software para integração de modalidades de escrita e voz / Multimodal interfaces process several types of user inputs, such as voice, gestures and pen interaction, in a combined and coordinated manner with the system?s multimedia output. Applications which support multimodality provide a more natural and flexible way for executing tasks with computers, since they allow users with different levels of abilities to choose the mode of interaction that best fits their needs. The use of interfaces that run away from the conventional style of interaction, based in keyboard and mouse, comes together with the concept of ubiquitous computing, which has been established as a research area that studies the social and technological aspects decurrent from the integration os systems and devices into the environments. In this context, the work reported here aimed to investigate the implementation of multimodal interfaces in ubiquitous computing applications, by means of the building of a software framework used for integrating handwriting and speech modalities
|
653 |
Developing Java Programs on Android Mobile Phones Using Speech RecognitionGande, Santhrushna 01 September 2015 (has links)
Nowadays Android operating system based mobile phones and tablets are widely used and had millions of users around the world. The popularity of this operating system is due to its multi-tasking, ease of access and diverse device options. “Java Programming Speech Recognition Application” is an Android application used for handicapped individuals who are not able or have difficultation to type on a keyboard. This application allows the user to write a compute program (in Java Language) by dictating the words and without using a keyboard. The user needs to speak out the commands and symbols required for his/her program. The program has been designed to pick up the Java constant keywords (such as ‘boolean’, ‘break’, ‘if’ and ‘else’), similar to the word received by the speech recognizer system in the application. The “Java Programming Speech Recognition Application” contains external plug-ins such as programming editor and a speech recognizer to record and write the program. These plug-ins come in the form of libraries and pre-coded folders which have to be attached to the main program by the developer.
|
654 |
Video Analysis of Mouth Movement Using Motion Templates for Computer-based Lip-ReadingYau, Wai Chee, waichee@ieee.org January 2008 (has links)
This thesis presents a novel lip-reading approach to classifying utterances from video data, without evaluating voice signals. This work addresses two important issues which are the efficient representation of mouth movement for visual speech recognition the temporal segmentation of utterances from video. The first part of the thesis describes a robust movement-based technique used to identify mouth movement patterns while uttering phonemes. This method temporally integrates the video data of each phoneme into a 2-D grayscale image named as a motion template (MT). This is a view-based approach that implicitly encodes the temporal component of an image sequence into a scalar-valued MT. The data size was reduced by extracting image descriptors such as Zernike moments (ZM) and discrete cosine transform (DCT) coefficients from MT. Support vector machine (SVM) and hidden Markov model (HMM) were used to classify the feature descriptors. A video speech corpus of 2800 utterances was collected for evaluating the efficacy of MT for lip-reading. The experimental results demonstrate the promising performance of MT in mouth movement representation. The advantages and limitations of MT for visual speech recognition were identified and validated through experiments. A comparison between ZM and DCT features indicates that th e accuracy of classification for both methods is very comparable when there is no relative motion between the camera and the mouth. Nevertheless, ZM is resilient to rotation of the camera and continues to give good results despite rotation but DCT is sensitive to rotation. DCT features are demonstrated to have better tolerance to image noise than ZM. The results also demonstrate a slight improvement of 5% using SVM as compared to HMM. The second part of this thesis describes a video-based, temporal segmentation framework to detect key frames corresponding to the start and stop of utterances from an image sequence, without using the acoustic signals. This segmentation technique integrates mouth movement and appearance information. The efficacy of this technique was tested through experimental evaluation and satisfactory performance was achieved. This segmentation method has been demonstrated to perform efficiently for utterances separated with short pauses. Potential applications for lip-reading technologies include human computer interface (HCI) for mobility-impaired users, defense applications that require voice-less communication, lip-reading mobile phones, in-vehicle systems, and improvement of speech-based computer control in noisy environments.
|
655 |
Alternativa metoder för att kontrollera ett användargränsnitt i en browser för teknisk dokumentation / Alternative methods for controlling the user interface in a browser for technical documentationSvensson, Cecilia January 2003 (has links)
<p>When searching for better and more practical interfaces between users and their computers, additional or alternative modes of communication between the two parties would be of great use. This thesis handles the possibilities of using eye and head movements as well as voice input as these alternative modes of communication. </p><p>One part of this project is devoted to find possible interaction techniques when navigating in a computer interface with movements of the eye or the head. The result of this part is four different controls of an interface, adapted to suit this kind of navigation, combined together in a demo application. </p><p>Another part of the project is devoted to the development of an application, with voice control as primary input method. The application developed is a simplified version of the application ActiViewer., developed by AerotechTelub Information&Media AB.</p>
|
656 |
Automatic Subtitle Generation for Sound in VideosGuenebaut, Boris January 2009 (has links)
<p>The last ten years have been the witnesses of the emergence of any kind of video content. Moreover, the appearance of dedicated websites for this phenomenon has increased the importance the public gives to it. In the same time, certain individuals are deaf and occasionally cannot understand the meanings of such videos because there is not any text transcription available. Therefore, it is necessary to find solutions for the purpose of making these media artefacts accessible for most people. Several software propose utilities to create subtitles for videos but all require an extensive participation of the user. Thence, a more automated concept is envisaged. This thesis report indicates a way to generate subtitles following standards by using speech recognition. Three parts are distinguished. The first one consists in separating audio from video and converting the audio in suitable format if necessary. The second phase proceeds to the recognition of speech contained in the audio. The ultimate stage generates a subtitle file from the recognition results of the previous step. Directions of implementation have been proposed for the three distinct modules. The experiment results have not done enough satisfaction and adjustments have to be realized for further work. Decoding parallelization, use of well trained models, and punctuation insertion are some of the improvements to be done.</p>
|
657 |
Ellection markup language (EML) based tele-voting systemGong, XiangQi January 2009 (has links)
Elections are one of the most fundamental activities of a democratic society. As is the case in any other aspect of life, developments in technology have resulted changes in the voting procedure from using the traditional paper-based voting to voting by use of electronic means, or e-voting. E-voting involves using different forms of electronic means like / voting machines, voting via the Internet, telephone, SMS and digital interactive television. This thesis concerns voting by telephone, or televoting, it starts by giving a brief overview and evaluation of various models and technologies that are implemented within such systems. The aspects of televoting that have been investigated are technologies that provide a voice interface to the voter and conduct the voting process, namely the Election Markup Language (EML), Automated Speech Recognition (ASR) and Text-to-Speech (TTS).
|
658 |
Single-Channel Multiple Regression for In-Car Speech EnhancementITAKURA, Fumitada, TAKEDA, Kazuya, ITOU, Katsunobu, LI, Weifeng 01 March 2006 (has links)
No description available.
|
659 |
Gamma Modeling of Speech Power and Its On-Line Estimation for Statistical Speech EnhancementITAKURA, Fumitada, TAKEDA, Kazuya, DAT, Tran Huy 01 March 2006 (has links)
No description available.
|
660 |
Physiologically Motivated Methods For Audio Pattern ClassificationRavindran, Sourabh 20 November 2006 (has links)
Human-like performance by machines in tasks of speech and audio processing has remained an elusive goal. In an attempt to bridge the gap in performance between humans and machines there has been an increased effort to study and model physiological processes. However, the widespread use of biologically inspired features proposed in the past has been hampered mainly by either the lack of robustness across a range of signal-to-noise ratios or the formidable computational costs. In physiological systems, sensor processing occurs in several stages. It is likely the case that signal features and biological processing techniques evolved together and are complementary or well matched. It is precisely for this reason that modeling the feature extraction processes should go hand in hand with modeling of the processes that use these features. This research presents a front-end feature extraction method for audio signals inspired by the human peripheral auditory system. New developments in the field of machine learning are leveraged to build classifiers to maximize the performance gains afforded by these features. The structure of the classification system is similar to what might be expected in physiological processing. Further, the feature extraction and classification algorithms can be efficiently implemented using the low-power cooperative analog-digital signal processing platform. The usefulness of the features is demonstrated for tasks of audio classification, speech versus non-speech discrimination, and speech recognition. The low-power nature of the classification system makes it ideal for use in applications such as hearing aids, hand-held devices, and surveillance through acoustic scene monitoring
|
Page generated in 0.0643 seconds