• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 15
  • 9
  • 5
  • 4
  • 4
  • 4
  • 1
  • 1
  • 1
  • Tagged with
  • 54
  • 54
  • 12
  • 10
  • 9
  • 9
  • 8
  • 8
  • 8
  • 8
  • 7
  • 6
  • 6
  • 6
  • 5
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
11

Talker identification is not improved by lexical access in the absence of familiar phonology

McLaughlin, Deirdre 06 June 2017 (has links)
Listeners identify talkers more accurately when they are familiar with both the sounds and words of the language being spoken. It is unknown whether lexical information alone can facilitate talker identification in the absence of familiar phonology. To dissociate the roles of familiar words and phonology, we developed English-Mandarin “hybrid” sentences, spoken in Mandarin, which can be convincingly coerced to sound like English when presented with corresponding subtitles (e.g., “wei4 gou3 chi1 kao3 li2 zhi1” becomes “we go to college”). Across two experiments, listeners learned to identify talkers in three conditions: listeners' native language (English), an unfamiliar, foreign language (Mandarin), and a foreign language paired with subtitles that primed native language lexical access (subtitled Mandarin). In Experiment 1 listeners underwent a single session of talker identity training; in Experiment 2 listeners completed three days of training. Talkers in a foreign language were identified no better when native language lexical representations were primed (subtitled Mandarin) than from foreign-language speech alone, regardless of whether they had received one or three days of talker identity training. These results suggest that the facilitatory effect of lexical access on talker identification depends on the availability of familiar phonological forms.
12

Transcrição em tempo real de textos utilizando um dicionário fonético

Gilza Paim Mandelman 03 September 2011 (has links)
In the quest to develop a technique which facilitates the process of automatic speech recognition for transcription real-time text using a phonetic dictionary, this work adopts a proposal nicknamed brazilês plus the use of syllables in the transcription process seeking possible imporvements to the automation especially in systems focused on accessibility, or even in helping to interactivity. There was the degree of improvement with the use of our technique especially in the response of the interactive process, decreasing the number of programmable routines, in their own interpretation of syllables using the portuguese spoken in Brazil and the degree of ease in the processes that enable accessibility. Thus, this work allows adjustment of the portuguese languge for use in computer systems, using natural language and presenting a proposal for a simplified routine for use in voice recognition software, improving the current routines that use neural networks from the other methods that produce the proposed interaction. To show the advantages of this technique was in-depth study of the proposed brazilês and set up the basic idea of seeking proposals for simplification, studying of automatic voice recognition (AVR), also developing a program that displays the formation of syllables of the portuguese language and analysis of the spelling of phonemes in the two encodings of the written language, portuguese and brazilês. / Na busca de desenvolver uma técnica facilitadora do processo de reconhecimento automático da voz para transcrição em tempo real de textos utilizando um dicionário fonético, este trabalho adotou uma proposta cognominada brazilês somada a utilização de silabas neste processo de transcrição buscando as possíveis melhorias para área de automação, especialmente em sistemas voltados à acessibilidade, ou mesmo no auxílio à interatividade. Verificou-se o grau de melhora com a utilização da técnica apresentada especialmente na resposta do processo interativo, na diminuição do número de rotinas programáveis, na própria interpretação das sílabas utilizando o português falado no Brasil, somado ao grau de facilidade que possibilitará nos processos de acessibilidade. Assim sendo, este trabalho possibilita adequação da língua portuguesa para uso em sistemas computacionais, utilizando a linguagem natural e apresentando uma proposta de rotina simplificada para ser utilizada em softwares de reconhecimento de voz, melhorando as rotinas atuais que usam desde redes neurais a outros métodos que produzam a interação esperada. Para comprovar as vantagens desta técnica houve estudo aprofundado da proposta brazilês e definiu-se proposições buscando a idéia básica de simplificação, estudando as formas de reconhecimento automático de voz (RAV), desenvolvendo também, um programa que apresenta a formação de sílabas da língua portuguesa e análise da grafia de fonemas nas duas codificações da língua escrita, o português e o brazilês.
13

Acionamento a distância de circuitos eletropneumáticos por reconhecimento de voz / Remote triggering of electro-pneumatic circuits by voice recognition

Vaslei Gil Balmant 19 March 2011 (has links)
Este trabalho tem como objetivo apresentar os passos iniciais para adequação entre a tecnologia de reconhecimento de voz e uma tecnologia de automatização de processos, mais especificamente, a eletropneumática, de forma que o comando do sistema possa ser realizado através de uma comunicação (voz) à distância. A implementação do sistema de reconhecimento de voz foi feita em um circuito elétrico de controle, para que válvulas eletropneumáticas possam ser acionadas à distância, visando oferecer uma nova opção de comando aos sistemas de automatização no setor industrial. Basicamente, os sinais dos comandos treinados pelo módulo de reconhecimento de voz são enviados a um visualizador digital, onde são interceptados por um circuito eletrônico, denominado, transmissor. O sinal interceptado em paralelo é convertido em serial, onde é enviado por rádio freqüência a outro circuito eletrônico, denominado, receptor. O sinal serial recebido pelo receptor é novamente convertido em paralelo. Esses sinais (comandos) digitais irão substituir os comandos musculares convencionais do circuito eletropneumático. Como o circuito elétrico de controle e o módulo de reconhecimento de voz possuem características funcionais específicas, há necessidade de realizar adequações no circuito eletropneumático tradicional, afim de garantir que a sequência de operações dos elementos de trabalhos sejam realizadas corretamente. Tais adequações consistem basicamente em garantir que apenas os comandos pré-determinados executem ações específicas, e que palavras não reconhecidas pelo módulo, ou códigos de erros, não interfiram no projeto. Para a avaliação do sistema foram realizados testes numa bancada de simulação para circuitos eletropneumáticos, utilizando-se um módulo de reconhecimento de voz para a emissão dos comandos. Os resultados obtidos após a validação do projeto foram plenamente satisfatórios. / This work aims to present the initial steps to adequacy between voice recognition technology and process automation technology, more specifically, the electro-pneumatic, so that the control of the system can be achieved through a remote kind of communication (voice). The implementation of the recognition of voice system was made in a control electrical circuit, so that the electro-pneumatic valves can be moved at a distance, aiming to provide a new option to the automation systems in the industrial sector. Basically, the signs of controls trained by voice recognition module are sent to a digital display, in which an electronic circuit called transmitter intercepts them. The signal intercepted in parallel is converted into serial, where it is sent by radio frequency to another circuit mail, called receiver. The serial signal received by the receiver is again converted into parallel. These signals (controls) will replace the conventional muscle controls circuit of the electro-pneumatic circuit. As the electrical control circuit and the voice recognition module have specific functional characteristics, there is a need to carry out adaptations in the electro-pneumatic traditional circuit, in order to ensure that the operation sequence of the elements of work be carried out correctly. Such adaptations consist basically in ensuring that only the pre-determined controls implement specific actions, but also words not recognized by module, or codes of errors, so that they do not interfere in the project. Tests were performed in a simulative workbench for electro-pneumatic circuits, using a voice recognition module for the control emission as the assessment of the system. The results obtained after validation of the project were fully satisfactory.
14

Application of voice recognition input to decision support systems

Drake, Robert Gervase 12 1900 (has links)
Approved for public release; distribution is unlimited / The goal of this study is to provide a single source of data that enables the selection of an appropriate voice recognition (VR) application for a decision support system (DSS) as well as for other computer applications. A brief background of both voice recognition systems and decision supports systems is provided with special emphasis given to the dialog component of DSS. The categories of voice recognition discussed are human factors, environmental factors, situational factors, quantitative factors, training factors, host computer factors, and experiments and research. Each of these areas of voice recognition is individually analyzed, and specific references to applicable literature are included. This study also includes appendices that contain: a glossary (including definitions) of phrases specific to both decision support system and voice recognition systems, keywords applicable to this study, an annotated bibliography (alphabetically and by specific topics) of current VR systems literature containing over 200 references, an index of publishers, a complete listing of current commercially available VR systems. / http://archive.org/details/applicationofvoi00drak / Lieutenant, United States Navy
15

Translators in the Loop: Observing and Analyzing the Translator Experience with Multimodal Interfaces for Interactive Translation Dictation Environment Design

Zapata Rojas, Julian January 2016 (has links)
This thesis explores interactive translation dictation (ITD), a translation technique that involves interaction with multimodal interfaces equipped with voice recognition (VR) technology throughout the entire translation process. Its main objective is to provide a solid theoretical background and an analysis of empirical qualitative and quantitative data that demonstrate ITD’s advantages and challenges, with a view to integrating this technique into the translation profession. Many empirical studies in human-computer interaction have strived to demonstrate the efficiency of voice input versus keyboard input. Although it was implicit in the earliest works that voice input was expected to completely replace—rather than complement—text-input devices, it was soon proposed that VR often performed better in combination with other input modes. This study introduces multimodal interaction to translation, taking advantage of the unparallelled robustness of commercially available voice-and-touch-enabled multimodal interfaces such as touch-screen computers and tablets. To that end, an experiment was carried out with 14 professional English-to-French translators, who performed a translation task either with the physical prototype of an ITD environment, or with a traditional keyboard-and-mouse environment. The hypothesis was that the prototypical environment would consistently provide translators with a better translator experience (TX) than the traditional environment, considering the translation process as a whole. The notion of TX as introduced in this work is defined as a translator’s perceptions of and responses to the use or anticipated use of a product, system or service. Both quantitative and qualitative data were collected using different methods, such as video and screen recording, input logging and semi-structured interviews. The combined analysis of objective and subjective usability measures suggests a better TX with the experimental environment versus the keyboard-and-mouse workstation, but significant challenges still need to be overcome for ITD to be fully integrated into the profession. Thus, this doctoral study provides a basis for better-grounded research in translator-computer interaction and translator-information interaction and, more specifically, for the design and development of an ITD environment, which is expected to support professional translators’ cognitive functions, performance and well-being. Lastly, this research aims to demonstrate that translation studies research, and translation technology in particular, needs to be more considerate of the translator, the TX, and the changing realities of the interaction between humans, computers and information in the twenty-first century.
16

CLASSIFYING ANXIETY BASED ON A VOICERECORDING USING LEARNING ALGORITHMS

Sherlock, Oscar, Rönnbäck, Olle January 2022 (has links)
Anxiety is becoming more and more common, seeking help to evaluate your anxiety canfirst of all take a long time, secondly, many of the tests are self-report assessments that could cause incorrect results. It has been shown there are several voice characteristics that are affected in people with anxiety. Knowing this, we got the idea that an algorithm can be developed to classify the amount of anxiety based on a person's voice. Our goal is that the developed algorithm can be used in collaboration with today's evaluation methods to increase the validity of anxiety evaluation. The algorithm would, in our opinion, give a more objective result than self-report assessments. In this thesis we answer questions such as “Is it possible toclassify anxiety based on a speech recording?”, as well as if deep learning algorithms perform better than machine learning algorithms on such a task. To answer the research questions we compiled a data set containing samples of people speaking with a varying degree of anxiety applied to their voice. We then implemented two algorithms able to classify the samples from our data set. One of the algorithms was a machine learning algorithm (ANN) with manual feature extraction, and the other one was a deep learning model (CNN) with automatic feature extraction. The performance of the two models were compared, and it was concluded that ANN was the better algorithm. When evaluating the models a 5-fold cross validation was used with a data split of 80/20. Every fold contains 100 epochs meaning we train both the models for a total of 500 epochs. For every fold the accuracy, precision, and recall is calculated. From these metrics we have then calculated other metrics such as sensitivity and specificity to compare the models. The ANN model performed a lot better than the CNN model on every single metric that was measured: accuracy, sensitivity, precision, f1-score, recall andspecificity.
17

Machine learning for identifying how much women and men talk in meetings

Wellander, Matilda, Sintorn, Vera January 2022 (has links)
For quite some time, it has been discussed that women are underrepresented in company boards. Furthermore, when they are a member of a board, they tend to have lower positions than men, meaning they have less power. One way to start solving the problem is to have more women in company boards and ensure they too have high positions. However, only having more women present might not be a complete solution. They also need space to speak to share their competence, ideas, and thoughts. Although, people tend to perceive women as more talkative than they actually are. For example, if a woman and a man speak the same amount of time, the woman is often perceived as having talked more than the man. To identify this problem, this study aimed to train a machine learning model that takes a recording of a meeting as input and calculates the time women and men spoke in percentage. The training data was based on 1266 episodes from the radio show “Sommar och Vinter i P1” where all episodes contained one speaker, different each time. 633 episodes contained female speakers and 633 contained male speakers, all speakers spoke Swedish in the recordings. Four different models were trained using different training data, where logistic regression is the best performing algorithm for all four. The four models were evaluated using evaluation data and they showed to not differ significantly in performance. The subsequently chosen model was tested on two recordings with both male and female speakers, where the resulting accuracy was 83.5% and 83.1%. The application developed in this study can help identify the speaking space given to women in the workplace. However, how this tool could be used to achieve a more equal workplace still needs further research.
18

Design Extractor: A ML-based Tool for CapturingSoftware Design Decisions

Söderström, Petrus January 2023 (has links)
Context: A software project’s success; involvinga larger group of individuals, relies on efficient teamcommunication. Part of efficient communication is avoidingmiscommunication, misunderstandings, and losingknowledge. These consequences of poor communication canlead to negative repercussions such as loss of time, money,and customer approval. Much effort has been put intocreating tools and systems to aid software engineers inretaining knowledge and decisions made during meetings,but many existing solutions require additional manualintervention on the part of software meeting participants.The objective of this thesis is to explore and develop a toolcalled Design Extractor (DE) which creates concisesummaries of design meetings from recorded voiceconversations. These summaries include both the designdecisions made during a meeting as well as the rationalebehind them. This thesis used readily available Pythonframeworks for machine learning to train two transformermodels based on DistilBert and Google’s BERT. Fine-tuningthese models with data sourcedfrom six different softwaredesign meetings found that the best base model wasDistilBert, which resulted in a fine-tuned model reporting anF1 score of 82.63%. This study created a simple Python tool,built upon many publicly available Python frameworks andthe fine-tuned transformer model, that takes in voicerecordings and outputs labeled sentence-label pairs that canbe used to quickly notate a design meeting. Short summariesare also provided by the tool through the use of pre-existingtext summarisation machine learning models such as BART.Design extractor therefore provides a simple quick way toreview longer meeting recordings in the context of softwareengineering decisions.
19

Michelangelo speaks : Voice controlled CNC plotter / Michelangelos verk : Röststyrd CNC-ritrobot

Karlsson, Marcus, Maroof, Havan January 2022 (has links)
CNC machines offer numerous advantages over conventional machining. It can be implemented in several ways and one such implementation is a drawing machine. In this bachelor thesis a voice controlled CNC plotter was designed, constructed and programmed. In order to create a better understanding of CNC and voice recognition, research questions were established and studied. The questions were mainly related to drawing speed as well as quality and accuracy of the voice recognition. The hardware of the plotter was mostly built out of 3D-printed parts as well as stepper motors, threaded rods and couplers for the movement system. The software of the plotter consisted of Arduino code, where instructions were written to make, for instance, the appropriate motor move. Tests were executed to gather data that later on were analysed. The analysis showed that the stepper motors and couplers had the greatest impact on the drawing speed as well as showing that the quality decreased when the speed increased. Furthermore the analysis showed that the voice recognition module achieved a high level of accuracy, however only when males spoke as it could not detect female voices. / CNC maskiner har flera fördelar jämfört med konventionella maskiner. De kan implementeras på en mängd olika sätt, exempelvis i en rit robot. I detta kandidatexamensarbete konstruerades och programmerades en röststyr dritrobot. För att erhålla en bättre uppfattning om CNC och röststyrning har två forskningsfrågor undersökts. Frågorna behandlar rithastighet, kvalite samt noggrannhet av röstigenkänningsmodulen. Hårdvaran består för det mesta av 3D-printade komponenter och gängade stänger som utgör rörelse systemet av roboten. Mjukvaran består endast av Arduino kod som innehåller instruktioner för exempelvis vilken motor som ska rotera. Flera experiment utfördes för att erhålla data som därefter analyserades. Analysen visade att stegmotorerna och axelkopplingarna hade störst påverkan på rithastigheten som i sin tur var en stor påverkande faktor för ritkvaliten. Ytterliggare analys visade att röstkortet hade hög noggrannhet men bara när en man talade då det inte kunde tolka kvinnliga röster.
20

Performance Analysis of Advanced Front Ends on the Aurora Large Vocabulary Evaluation

Parihar, Naveen 13 December 2003 (has links)
Over the past few years, speech recognition technology performance on tasks ranging from isolated digit recognition to conversational speech has dramatically improved. Performance on limited recognition tasks in noiseree environments is comparable to that achieved by human transcribers. This advancement in automatic speech recognition technology along with an increase in the compute power of mobile devices, standardization of communication protocols, and the explosion in the popularity of the mobile devices, has created an interest in flexible voice interfaces for mobile devices. However, speech recognition performance degrades dramatically in mobile environments which are inherently noisy. In the recent past, a great amount of effort has been spent on the development of front ends based on advanced noise robust approaches. The primary objective of this thesis was to analyze the performance of two advanced front ends, referred to as the QIO and MFA front ends, on a speech recognition task based on the Wall Street Journal database. Though the advanced front ends are shown to achieve a significant improvement over an industry-standard baseline front end, this improvement is not operationally significant. Further, we show that the results of this evaluation were not significantly impacted by suboptimal recognition system parameter settings. Without any front end-specific tuning, the MFA front end outperforms the QIO front end by 9.6% relative. With tuning, the relative performance gap increases to 15.8%. Finally, we also show that mismatched microphone and additive noise evaluation conditions resulted in a significant degradation in performance for both front ends.

Page generated in 0.0958 seconds