Global ETD Search

551	In-vehicle Multimodal Interaction January 2015 (has links) abstract: Despite the various driver assistance systems and electronics, the threat to life of driver, passengers and other people on the road still persists. With the growth in technology, the use of in-vehicle devices with a plethora of buttons and features is increasing resulting in increased distraction. Recently, speech recognition has emerged as an alternative to distraction and has the potential to be beneficial. However, considering the fact that automotive environment is dynamic and noisy in nature, distraction may not arise from the manual interaction, but due to the cognitive load. Hence, speech recognition certainly cannot be a reliable mode of communication. The thesis is focused on proposing a simultaneous multimodal approach for designing interface between driver and vehicle with a goal to enable the driver to be more attentive to the driving tasks and spend less time fiddling with distractive tasks. By analyzing the human-human multimodal interaction techniques, new modes have been identified and experimented, especially suitable for the automotive context. The identified modes are touch, speech, graphics, voice-tip and text-tip. The multiple modes are intended to work collectively to make the interaction more intuitive and natural. In order to obtain a minimalist user-centered design for the center stack, various design principles such as 80/20 rule, contour bias, affordance, flexibility-usability trade-off etc. have been implemented on the prototypes. The prototype was developed using the Dragon software development kit on android platform for speech recognition. In the present study, the driver behavior was investigated in an experiment conducted on the DriveSafety driving simulator DS-600s. Twelve volunteers drove the simulator under two conditions: (1) accessing the center stack applications using touch only and (2) accessing the applications using speech with offered text-tip. The duration for which user looked away from the road (eyes-off-road) was measured manually for each scenario. Comparison of results proved that eyes-off-road time is less for the second scenario. The minimalist design with 8-10 icons per screen proved to be effective as all the readings were within the driver distraction recommendations (eyes-off-road time < 2sec per screen) defined by NHTSA. / Dissertation/Thesis / Masters Thesis Computer Science 2015 Communication Design Cognitive psychology driver distraction human-car interaction infotainment design in-vehicle interaction multimodal speech recognition
552	Approximate Neural Networks for Speech Applications in Resource-Constrained Environments January 2016 (has links) abstract: Speech recognition and keyword detection are becoming increasingly popular applications for mobile systems. While deep neural network (DNN) implementation of these systems have very good performance, they have large memory and compute resource requirements, making their implementation on a mobile device quite challenging. In this thesis, techniques to reduce the memory and computation cost of keyword detection and speech recognition networks (or DNNs) are presented. The first technique is based on representing all weights and biases by a small number of bits and mapping all nodal computations into fixed-point ones with minimal degradation in the accuracy. Experiments conducted on the Resource Management (RM) database show that for the keyword detection neural network, representing the weights by 5 bits results in a 6 fold reduction in memory compared to a floating point implementation with very little loss in performance. Similarly, for the speech recognition neural network, representing the weights by 6 bits results in a 5 fold reduction in memory while maintaining an error rate similar to a floating point implementation. Additional reduction in memory is achieved by a technique called weight pruning, where the weights are classified as sensitive and insensitive and the sensitive weights are represented with higher precision. A combination of these two techniques helps reduce the memory footprint by 81 - 84% for speech recognition and keyword detection networks respectively. Further reduction in memory size is achieved by judiciously dropping connections for large blocks of weights. The corresponding technique, termed coarse-grain sparsification, introduces hardware-aware sparsity during DNN training, which leads to efficient weight memory compression and significant reduction in the number of computations during classification without loss of accuracy. Keyword detection and speech recognition DNNs trained with 75% of the weights dropped and classified with 5-6 bit weight precision effectively reduced the weight memory requirement by ~95% compared to a fully-connected network with double precision, while showing similar performance in keyword detection accuracy and word error rate. / Dissertation/Thesis / Masters Thesis Computer Science 2016 Artificial intelligence Deep Neural Networks Keyword Detection Memory Compression Speech Recognition
553	Reconnaissance automatique de la parole de personnes âgées pour les services d'assistance à domicile / Automatic speech recognition for ageing voices in the context of assisted living Aman, Frédéric 09 December 2014 (has links) Dans le contexte du vieillissement de la population, le but de cette thèse est d'inclure au domicile des personnes âgées un système de reconnaissance automatique de la parole (RAP) capable de reconnaître des appels de détresse pour alerter les secours. Les modèles acoustiques des systèmes de RAP sont généralement appris avec de la parole non âgée, prononcé de façon neutre et lue. Or, dans notre contexte, nous sommes loin de ces conditions idéales (voix âgée et émue), et le système doit donc être adapté à la tâche. Notre travail s’appuie sur des corpus de voix âgées et d'appels de détresse que nous avons enregistrés. A partir de ces corpus, une étude sur les différences entre voix jeunes/âgées d'une part, et entre voix neutre/émue d'autre part nous ont permis de développer un système de RAP adapté à la tâche. Celui-ci a ensuite été évalué sur des données issues d'une expérimentation en situation réaliste incluant des chutes jouées. / In the context of the aging population, the aim of this thesis is to include in the living environment of the elderly people an automatic speech recognition (ASR) system, which can recognize calls to alert the emergency services. The acoustic models of ASR systems are mostly learned with non-elderly speech, delivered in a neutral way, and read. However, in our context, we are far from these ideal conditions (aging and expressive voice). So, our system must be adapted to the task. For our work, we recorded corpora made of elderly voices and distress calls. From these corpora, a study on the differences between young and old voices, and between neutral and emotional voice permit to develop an ASR system adapted to the task. This system was then evaluated on data recorded during an experiment in realistic situation, including falls played by volunteers. Reconnaissance automatique de la parole Personnes âgées Voix en détresse Automatic speech recognition Ageing people Distress voice 004
554	The Value of Two Ears for Sound Source Localization and Speech Understanding in Complex Listening Environments: Two Cochlear Implants vs. Two Partially Hearing Ears and One Cochlear Implant January 2013 (has links) abstract: Two groups of cochlear implant (CI) listeners were tested for sound source localization and for speech recognition in complex listening environments. One group (n=11) wore bilateral CIs and, potentially, had access to interaural level difference (ILD) cues, but not interaural timing difference (ITD) cues. The second group (n=12) wore a single CI and had low-frequency, acoustic hearing in both the ear contralateral to the CI and in the implanted ear. These `hearing preservation' listeners, potentially, had access to ITD cues but not to ILD cues. At issue in this dissertation was the value of the two types of information about sound sources, ITDs and ILDs, for localization and for speech perception when speech and noise sources were separated in space. For Experiment 1, normal hearing (NH) listeners and the two groups of CI listeners were tested for sound source localization using a 13 loudspeaker array. For the NH listeners, the mean RMS error for localization was 7 degrees, for the bilateral CI listeners, 20 degrees, and for the hearing preservation listeners, 23 degrees. The scores for the two CI groups did not differ significantly. Thus, both CI groups showed equivalent, but poorer than normal, localization. This outcome using the filtered noise bands for the normal hearing listeners, suggests ILD and ITD cues can support equivalent levels of localization. For Experiment 2, the two groups of CI listeners were tested for speech recognition in noise when the noise sources and targets were spatially separated in a simulated `restaurant' environment and in two versions of a `cocktail party' environment. At issue was whether either CI group would show benefits from binaural hearing, i.e., better performance when the noise and targets were separated in space. Neither of the CI groups showed spatial release from masking. However, both groups showed a significant binaural advantage (a combination of squelch and summation), which also maintained separation of the target and noise, indicating the presence of some binaural processing or `unmasking' of speech in noise. Finally, localization ability in Experiment 1 was not correlated with binaural advantage in Experiment 2. / Dissertation/Thesis / Ph.D. Speech and Hearing Science 2013 Behavioral sciences bilateral cochlear implants binaural hearing hearing preservation localization spatial hearing speech recognition in noise
555	[en] CONTINUOUS SPEECH RECOGNITION FOR THE PORTUGUESE USING HIDDEN MARKOV MODELS / [pt] RECONHECIMENTO DE VOZ CONTÍNUA PARA O PORTUGUÊS UTILIZANDO MODELOS DE MARKOV ESCONDIDOS SIDNEY CERQUEIRA BISPO DOS SANTOS 24 May 2006 (has links) [pt] Esta tese apresenta contribuições par a melhoria de sistemas de reconhecimento de voz contínua, utilizando Modelos de Markov Escondidos. A maioria dessas contribuições são específicas para aplicações que envolvem a língua portuguesa. Inicialmente, são propostos dois inventários reduzidos de unidades fonéticas a partir de características do português brasileiro. São analisadas algumas formas de inicialização e propõe-se um método rápido e eficaz de inicialização dos modelos dessas unidades. Escolhe-se um método de segmentação, a forma de concatenação dos modelos para a formação de modelos de palavras e sentenças e propõe-se um algoritmo eficiente para o treinamento. Resultado de simulações mostram que o desempenho dos dois inventários são comparáveis, qundo utilizados com gramática bigrama. O número de unidades desses inventários é bastante reduzido quando comparado ao inventário das unidades dependentes do contexto mais utilizadas tais como dofones e trifones, entre outras. Como o desempenho de recinhecimento de voz contínua dependem fortemente dos atributos de voz utilizados. examina-se diversos conjuntos de atributos e seu comportamento relativo em reconhecedores baseados na língua portuguesa, no modo independente do locutor. O conjunto de coeficiente PLP com suas primeiras e segundas derivadas apresentam o melhor desempenho. Em seguida é proposto um sistema de reconhecimento de pedidos de ligações telefônicas automáticas que utiliza conhecimentos sintáticos da língua portuguesa e conhecimento dependentes da tarefa. Esse sistema permite a decodificação nõa só de dígitos mas também de números naturais o que o torna bastante amigável ao usuário, permitindo ampla liberdade aos locutores na forma de pedir uma ligação telefônica.Com base em máquinas de estados finitos propostas para a implementação do reconhecimento, ão analisados dois algoritmos de decodificação, o Level Building e o One Pass, e propõe-se um novo algoritm, a partir de modificações no One Pass, mais eficiente na utilização das fontes de conhecimento sitática e dependente da tarefa. O sitems de RVC, em português, também é analisado utilizando- se as sílabas como unidade fonéticas. Testes são realizados no modo dependente e independente do locutor. Conclui-se sobre a viabilidade do seu emprego em reconhecimento de voz contínua baseados na língua portuguesa, em contraste com seu desempenho insatisfatório para a língua inglesa. Finalmente, é analisada a influência das palavras-função (word-functions), no desempenho de reconhecedores para o português. Embora para a língua inglesa, as palavras-unção tenham um papel de fundamental importância no desempenho do sistema de reconhecimento, conclui-se que isso não é verdade para a língua portuguesa. / [en] This work presents several contributions for the improvement of CDHMM-based Continuous Speech Recognition (CSR) Systems. Most of these contributions are specific for Portuguese language. Two reduced sets of phonetic units, based on the characteristics of the Portuguese language, are proposed. Several initialization procedures are analized and an efficient and fast method of model initialization is proposed. Methods are described for segmentation of sentences and for concatenation of unit to form word and sentence models. An efficient training algorithm for the reduced sets of units is then proposed. Simulation results show that the performance of the two sets are comparable when bigrams are used. The number of units of these sets are significantly reduced when compared to diphones and triphones, which are widely used sets of context-dependent units. The performance of Continuous Speech Recognizers is strongly dependent on the speech features. For this reason, a comparative performance of several sets of features for the Portuguese language is carried out. The PLP coefficients with their first and second derivatives yielded the best results. A Continuous Speech Recognition System that uses syntactic knowledge of the Portuguese language is proposed. This system makes use of task dependent knowledges for automatic dial-up telephone calls. The recognition system can allows parsing of digits as well as natural numbers. This is a user friendly feature feature that permits, for the caller, a large degree of freedom in placing a call. Based on the finite state machine proposed for the implementation of the speech recognizer described in this thesis, two parsing algorithms are analized - the Level Building and the One pass. Then, a new algorithm is proposed, which is more efficient than the other two techniques. The proposed scheme is more suitable for the use of synthatic and task-dependent knowledge sources. The contribution of this thesis is concerned with the use of the syllables as phonetic units in Portuguese-based CSR systems. Dependent and Independent speaker tasks are examined. It is shown that syllables provide good results when used as phonetic units in Portuguese-based CSR systemsm, in contrast with their poor performance in English-based recognition schemes. Finally, the influence of word-functions is analized in Portuguese-based speech recognition systems. Although word- functions play a critical role in the English-basec CSR, it was found that this is not true for the Portuguese language. [pt] RECONHECIMENTO DE VOZ [en] SPEECH RECOGNITION [pt] LINGUA PORTUGUESA [en] PORTUGUESE LANGUAGE [pt] ALGORITMO [en] ALGORITHM
556	Speech recognition availability / Tillgängligheten i taligenkänning Eriksson, Mattias January 2004 (has links) This project investigates the importance of availability in the scope of dictation programs. Using speech recognition technology for dictating has not reached the public, and that may very well be a result of poor availability in today’s technical solutions. I have constructed a persona character, Johanna, who personalizes the target user. I have also developed a solution that streams audio into a speech recognition server and sends back interpreted text. Johanna affirmed that the solution was successful in theory. I then incorporated test users that tried out the solution in practice. Half of them do indeed claim that their usage has been and will continue to be increased thanks to the new level of availability. Datalogi Speech recognition dictation program availability streaming audio persona. Datalogi Computer Sciences Datavetenskap (datalogi)
557	Automatic Transcript Generator for Podcast Files Holst, Andy January 2010 (has links) In the modern world, Internet has become a popular place, people with speech hearing disabilities and search engines can't take part of speech content in podcast les. In order to solve the problem partially, the Sphinx decoders such as Sphinx-3, Sphinx-4 can be used to implement a Auto Transcript Generator application, by coupling already existing large acoustic model, language model and a existing dictionary, or by training your own large acoustic model, language model and creating your own dictionary to support continuous speaker independent speech recognition system. speech recognition auto transcript generator implementation podcast acoustic model language model dictionary Computer Sciences Datavetenskap (datalogi)
558	RECONHECIMENTO DE SENTENÇAS NO SILÊNCIO E NO RUÍDO, EM CAMPO LIVRE, EM INDIVÍDUOS PORTADORES DE PERDA AUDITIVA NEUROSSENSORIAL DE GRAU MODERADO / SENTENCES RECOGNITION IN QUIET AND IN NOISE, IN FREE FIELD, OF INDIVIDUALS WITH NEUROSENSORIAL HEARING LOSS OF MODERATE LEVEL Padilha, Cristiane Bertolazi 14 July 2008 (has links) In the clinical routine of an audiologist, it is becoming more frequent the complaints about speech misunderstanding in a noisy environment. Audiological tests which use sentences as stimulus have been object of research because, besides examining the real auditory skill of the patient, they promote a direct approximation to communicative situations and provide information that will indicate the most adequate behavior to be recommended to the patient with hearing deficits. The aim of this study was to determine sentences recognition thresholds in free field, with the presence and the absence of competitive noise, in a group of participants with neurossensorial hearing loss of a moderate level. It was examined 50 participants, 27 men and 23 women, aged between 45 and 76. Firstly, it was carried out anamnesis, meatuscopy, threshold tonal audiometry, SRT and SRPI tests. Next, using the Portuguese Sentences Lists test (PSL, 1998), SRTQ and SRTN tests were carried out, with a fixed noise level of 65 dB A. The average SRTQ was 60,90 dB A, the average SRTN in the same group was 68,20 dB A and the average S/N ratio was + 3,20 dB A. The inclusion of tests in free field using sentences as stimulus, with and without competitive noise, after the basic audiological evaluation in a patient with hearing deficits, have brought answers broader than the skills to detect the presence of pure tones and to recognize isolated words. These tests assess the patient as a whole, simulating communicative situations as well as providing data about skills and limitations of each person which determine his/her communication capacity. / Na rotina clínica do audiologista, a cada dia tornam-se mais freqüentes as queixas de dificuldade de compreensão de fala em ambiente ruidoso. Testes audiológicos que utilizam sentenças como estímulo, tem sido objeto de pesquisa, pois além de verificarem a real habilidade auditiva do paciente, proporcionam uma aproximação direta com situações de comunicação e fornecem informações que vão orientar a conduta mais adequada a ser indicada para o indivíduo com queixa de distúrbios de audição. O objetivo desta pesquisa foi determinar os limiares de reconhecimento de sentenças em campo livre, com a presença e ausência de ruído competitivo, em um grupo de indivíduos portadores de perda auditiva neurossensorial de grau moderado. Foram avaliados 50 indivíduos, sendo 27 homens e 23 mulheres, com idades entre 45 e 76 anos. Inicialmente, realizou-se anamnese, meatoscopia, audiometria tonal liminar, pesquisa do LRF e do IPRF. Posteriormente, utilizando o teste Listas de Sentenças em Português (LSP, 1998), realizou-se inicialmente a pesquisa dos LRSS e a seguir o LRSR, com um nível fixo de ruído de 65 dB A. O LRSS médio obtido foi de 60,90 dB A, o LRSR médio encontrado neste mesmo grupo foi de 68,20 dB A e a média das relações S/R encontrada foi de + 3,20 dB A. A inclusão dos testes em campo livre, utilizando sentenças como estímulo, com e sem a presença de ruído competitivo, após a avaliação audiológica básica, em indivíduo com distúrbio da audição possibilita a obtenção de respostas que vão além das habilidades para detectar a presença de tons puros e reconhecer palavras isoladas. Estes testes avaliam o indivíduo como um todo, simulando situações de comunicação, fornecendo dados sobre as habilidades e limitações de cada indivíduo, que determinam a sua capacidade de comunicação. Avaliação audiológica Reconhecimento de fala Ruído Audiological assessment Speech recognition Noise CNPQ::CIENCIAS DA SAUDE::FONOAUDIOLOGIA
559	Application of voice recognition input to decision support systems Drake, Robert Gervase 12 1900 (has links) Approved for public release; distribution is unlimited / The goal of this study is to provide a single source of data that enables the selection of an appropriate voice recognition (VR) application for a decision support system (DSS) as well as for other computer applications. A brief background of both voice recognition systems and decision supports systems is provided with special emphasis given to the dialog component of DSS. The categories of voice recognition discussed are human factors, environmental factors, situational factors, quantitative factors, training factors, host computer factors, and experiments and research. Each of these areas of voice recognition is individually analyzed, and specific references to applicable literature are included. This study also includes appendices that contain: a glossary (including definitions) of phrases specific to both decision support system and voice recognition systems, keywords applicable to this study, an annotated bibliography (alphabetically and by specific topics) of current VR systems literature containing over 200 references, an index of publishers, a complete listing of current commercially available VR systems. / http://archive.org/details/applicationofvoi00drak / Lieutenant, United States Navy Voice recognition Automatic speech recognition Talkwriter Decision support systems (DSS) Group decision support systems (GDSS)
560	Comparing speech recognition and touch tone as input modalities for Technologically unsophisticated users Kafidi, Petrus L 13 June 2005 (has links) Using an automated service to access information via the telephone has become an important productivity enhancer in the developed world. However, such automated services are generally quite inaccessible to users who have had little technological exposure. There has been a widespread belief that speech-recognition technology can be used to bridge this gap, but little objective evidence for this belief has been produced. To address this situation, two interfaces, touchtone and speech-based, were designed and implemented as input modalities to a system that provides technologically unsophisticated users with access to an informational/transactional service. These interfaces were optimised and compared using transaction completion rates, time taken to complete tasks, error rates and user satisfaction. The speech-based interface was found to outperform the touchtone interface in terms of completion rate, error rate and user satisfaction. The data obtained on time taken to complete tasks could not be compared as the DTMF interface data were highly influenced by people who are not technologically unsophisticated. These results serve as a confirmation that speech-based interfaces are more effective and more satisfying and can therefore enhance information dissemination to people who are not well exposed to the technology. / Dissertation (MSc)--University of Pretoria, 2006. / Computer Science / unrestricted Telephony interface Human-computer interaction Speech recognition Usability User interface UCTD

Search results