• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 46
  • 6
  • 4
  • 3
  • 3
  • 2
  • 2
  • 2
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 85
  • 85
  • 85
  • 33
  • 28
  • 19
  • 18
  • 18
  • 18
  • 14
  • 14
  • 12
  • 12
  • 12
  • 10
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
71

Homograph Disambiguation and Diacritization for Arabic Text-to-Speech Using Neural Networks / Homografdisambiguering och diakritisering för arabiska text-till-talsystem med hjälp av neurala nätverk

Lameris, Harm January 2021 (has links)
Pre-processing Arabic text for Text-to-Speech (TTS) systems poses major challenges, as Arabic omits short vowels in writing. This omission leads to a large number of homographs, and means that Arabic text needs to be diacritized to disambiguate these homographs, in order to be matched up with the intended pronunciation. Diacritizing Arabic has generally been achieved by using rule-based, statistical, or hybrid methods that combine rule-based and statistical methods. Recently, diacritization methods involving deep learning have shown promise in reducing error rates. These deep-learning methods are not yet commonly used in TTS engines, however. To examine neural diacritization methods for use in TTS engines, we normalized and pre-processed a version of the Tashkeela corpus, a large diacritized corpus containing largely Classical Arabic texts, for TTS purposes. We then trained and tested three state-of-the-art Recurrent-Neural-Network-based models on this data set. Additionally we tested these models on the Wiki News corpus, a test set that contains Modern Standard Arabic (MSA) news articles and thus more closely resembles most TTS queries. The models were evaluated by comparing the Diacritic Error Rate (DER) and Word Error Rate (WER) achieved for each data set to one another and to the DER and WER reported in the original papers. Moreover, the per-diacritic accuracy was examined, and a manual evaluation was performed. For the Tashkeela corpus, all models achieved a lower DER and WER than reported in the original papers. This was largely the result of using more training data in addition to the TTS pre-processing steps that were performed on the data. For the Wiki News corpus, the error rates were higher, largely due to the domain gap between the data sets. We found that for both data sets the models overfit on common patterns and the most common diacritic. For the Wiki News corpus the models struggled with Named Entities and loanwords. Purely neural models generally outperformed the model that combined deep learning with rule-based and statistical corrections. These findings highlight the usability of deep learning methods for Arabic diacritization in TTS engines as well as the need for diacritized corpora that are more representative of Modern Standard Arabic.
72

Discussion On Effective Restoration Of Oral Speech Using Voice Conversion Techniques Based On Gaussian Mixture Modeling

Alverio, Gustavo 01 January 2007 (has links)
Today's world consists of many ways to communicate information. One of the most effective ways to communicate is through the use of speech. Unfortunately many lose the ability to converse. This in turn leads to a large negative psychological impact. In addition, skills such as lecturing and singing must now be restored via other methods. The usage of text-to-speech synthesis has been a popular resolution of restoring the capability to use oral speech. Text to speech synthesizers convert text into speech. Although text to speech systems are useful, they only allow for few default voice selections that do not represent that of the user. In order to achieve total restoration, voice conversion must be introduced. Voice conversion is a method that adjusts a source voice to sound like a target voice. Voice conversion consists of a training and converting process. The training process is conducted by composing a speech corpus to be spoken by both source and target voice. The speech corpus should encompass a variety of speech sounds. Once training is finished, the conversion function is employed to transform the source voice into the target voice. Effectively, voice conversion allows for a speaker to sound like any other person. Therefore, voice conversion can be applied to alter the voice output of a text to speech system to produce the target voice. The thesis investigates how one approach, specifically the usage of voice conversion using Gaussian mixture modeling, can be applied to alter the voice output of a text to speech synthesis system. Researchers found that acceptable results can be obtained from using these methods. Although voice conversion and text to speech synthesis are effective in restoring voice, a sample of the speaker before voice loss must be used during the training process. Therefore it is vital that voice samples are made to combat voice loss.
73

A Toolkit for Multimodal Interface Design: An Empirical Investigation

Rigas, Dimitrios I., Alsuraihi, M. January 2007 (has links)
No / This paper introduces a comparative multi-group study carried out to investigate the use of multimodal interaction metaphors (visual, oral, and aural) for improving learnability (or usability from first time use) of interface-design environments. An initial survey was used for taking views about the effectiveness and satisfaction of employing speech and speech-recognition for solving some of the common usability problems. Then, the investigation was done empirically by testing the usability parameters: efficiency, effectiveness, and satisfaction of three design-toolkits (TVOID, OFVOID, and MMID) built especially for the study. TVOID and OFVOID interacted with the user visually only using typical and time-saving interaction metaphors. The third environment MMID added another modality through vocal and aural interaction. The results showed that the use of vocal commands and the mouse concurrently for completing tasks from first time use was more efficient and more effective than the use of visual-only interaction metaphors.
74

Identification and Classification of TTS Intelligibility Errors Using ASR : A Method for Automatic Evaluation of Speech Intelligibility / Identifiering och klassifiering av fel relaterade till begriplighet inom talsyntes. : Ett förslag på en metod för automatisk utvärdering av begriplighet av tal.

Henriksson, Erik January 2023 (has links)
In recent years, applications using synthesized speech have become more numerous and publicly available. As the area grows, so does the need for delivering high-quality, intelligible speech, and subsequently the need for effective methods of assessing the intelligibility of synthesized speech. The common method of evaluating speech using human listeners has the disadvantages of being costly and time-inefficient. Because of this, alternative methods of evaluating speech automatically, using automatic speech recognition (ASR) models, have been introduced. This thesis presents an evaluation system that analyses the intelligibility of synthesized speech using automatic speech recognition, and attempts to identify and categorize the intelligibility errors present in the speech. This system is put through evaluation using two experiments. The first uses publicly available sentences and corresponding synthesized speech, and the second uses publicly available models to synthesize speech for evaluation. Additionally, a survey is conducted where human transcriptions are used instead of automatic speech recognition, and the resulting intelligibility evaluations are compared with those based on automatic speech recognition transcriptions. Results show that this system can be used to evaluate the intelligibility of a model, as well as identify and classify intelligibility errors. It is shown that a combination of automatic speech recognition models can lead to more robust and reliable evaluations, and that reference human recordings can be used to further increase confidence. The evaluation scores show a good correlation with human evaluations, while certain automatic speech recognition models are shown to have a stronger correlation with human evaluations. This research shows that automatic speech recognition can be used to produce a reliable and detailed analysis of text-to-speech intelligibility, which has the potential of making text-to-speech (TTS) improvements more efficient and allowing for the delivery of better text-to-speech models at a faster rate. / Under de senaste åren har antalet applikationer som använder syntetiskt tal ökat och blivit mer tillgängliga för allmänheten. I takt med att området växer ökar också behovet av att leverera tal av hög kvalitet och tydlighet, och därmed behovet av effektiva metoder för att bedöma förståeligheten hos syntetiskt tal. Den vanliga metoden att utvärdera tal med hjälp av mänskliga lyssnare har nackdelarna att den är kostsam och tidskrävande. Av den anledningen har alternativa metoder för att automatiskt utvärdera tal med hjälp av automatiska taligenkänningsmodeller introducerats. I denna avhandling presenteras ett utvärderingssystem som analyserar förståeligheten hos syntetiskt tal med hjälp av automatisk taligenkänning och försöker identifiera och kategorisera de fel i förståelighet som finns i talet. Detta system genomgår sedan utvärdering genom två experiment. Det första experimentet använder offentligt tillgängliga meningar och motsvarande ljudfiler med syntetiskt tal, och det andra använder offentligt tillgängliga modeller för att syntetisera tal för utvärdering. Dessutom genomförs en enkätundersökning där mänskliga transkriptioner används istället för automatisk taligenkänning. De resulterande bedömningarna av förståelighet jämförs sedan med bedömningar baserade på transkriptioner producerade med automatisk taligenkänning. Resultaten visar att utvärderingen som utförs av detta system kan användas för att bedöma förståeligheten hos en talsyntesmodell samt identifiera och kategorisera fel i förståelighet. Det visas att en kombination av automatiska taligenkänningsmodeller kan leda till mer robusta och tillförlitliga utvärderingar, och att referensinspelningar av mänskligt tal kan användas för att ytterligare öka tillförlitligheten. Utvärderingsresultaten visar en god korrelation med mänskliga utvärderingar, medan vissa automatiska taligenkänningsmodeller visar sig ha en starkare korrelation med mänskliga utvärderingar. Denna forskning visar att automatisk taligenkänning kan användas för att producera pålitlig och detaljerad analys av förståeligheten hos talsyntes, vilket har potentialen att göra förbättringar inom talsyntes mer effektiva och möjliggöra leverans av bättre talsyntes-modeller i snabbare takt.
75

Synthèse de parole expressive au delà du niveau de la phrase : le cas du conte pour enfant : conception et analyse de corpus de contes pour la synthèse de parole expressive / Expressive speech synthesis beyond the level of the sentence : the children tale usecase : tale corpora design and analysis for expressive speech synthesis

Doukhan, David 20 September 2013 (has links)
L'objectif de la thèse est de proposer des méthodes permettant d'améliorer l'expressivité des systèmes de synthèse de la parole. Une des propositions centrales de ce travail est de définir, utiliser et mesurer l'impact de structures linguistiques opérant au delà du niveau de la phrase, par opposition aux approches opérant sur des phrases isolées de leur contexte. Le cadre de l'étude est restreint au cas de la lecture de contes pour enfants. Les contes ont la particularité d'avoir été l'objet d'un certain nombre d'études visant à en dégager une structure narrative et de faire intervenir une certain nombre de stéréotypes de personnages (héros, méchant, fée) dont le discours est souvent rapporté. Ces caractéristiques particulières sont exploitées pour modéliser les propriétés prosodiques des contes au delà du niveau de la phrase. La transmission orale des contes a souvent été associée à une pratique musicale (chants, instruments) et leur lecture reste associée à des propriétés mélodiques très riches, dont la reproduction reste un défi pour les synthétiseurs de parole modernes. Pour répondre à ces problématiques, un premier corpus de contes écrits est collecté et annoté avec des informations relatives à la structure narrative des contes, l'identification et l'attribution des citations directes, le référencement des mentions des personnages ainsi que des entités nommées et des énumérations étendues. Le corpus analysé est décrit en terme de couverture et d'accord inter-annotateurs. Il est utilisé pour modéliser des systèmes de segmentation des contes en épisode, de détection des citations directes, des actes de dialogue et des modes de communication. Un deuxième corpus de contes lus par un locuteur professionnel est présenté. La parole est alignée avec les transcriptions lexicale et phonétique, les annotations du corpus texte et des méta-informations décrivant les caractéristiques des personnages intervenant dans le conte. Les relations entre les annotations linguistiques et les propriétés prosodiques observées dans le corpus de parole sont décrites et modélisées. Finalement, un prototype de contrôle des paramètres expressifs du synthétiseur par sélection d'unités Acapela est réalisé. Le prototype génère des instructions prosodiques opérant au delà du niveau de la phrase, notamment en utilisant les informations liées à la structure du conte et à la distinction entre discours direct et discours rapporté. La validation du prototype de contrôle est réalisée dans le cadre d'une expérience perceptive, qui montre une amélioration significative de la qualité de la synthèse. / The aim of this thesis is to propose ways to improve the expressiveness of speech synthesis systems. One of the central propositions of this work is to define, use and measure the impact of linguistic structures operating beyond the sentence level, as opposed to approaches operating on sentences out of their context. The scope of the study is restricted to the case of storytelling for children. The stories have the distinction of having been the subject of a number of studies in order to highlight a narrative structure and involve a number of stereotypical characters (hero, villain, fairy) whose speech is often reported. These special features are used to model the prosodic properties tales beyond the sentence level. The oral transmission of tales was often associated with musical practice (vocals, instruments) and their reading is associated with rich melodic properties including reproduction remains a challenge for modern speech synthesizers. To address these issues, a first corpus of written tales is collected and annotated with information about the narrative structure of stories, identification and allocation of direct quotations, referencing references to characters as well as named entities and enumerations areas. The corpus analyzed is described in terms of coverage and inter-annotator agreement. It is used to model systems segmentation tales episode, detection of direct quotes, dialogue acts and modes of communication. A second corpus of stories read by a professional speaker is presented. The word is aligned with the lexical and phonetic transcriptions, annotations of the corpus text and meta-information describing the characteristics of the characters involved in the story. The relationship between linguistic annotations and prosodic properties observed in the speech corpus are described and modeled. Finally, a prototype control expressive synthesizer parameters by Acapela unit selection is made. The prototype generates prosodic operating instructions beyond the sentence level, including using the information related to the structure of the story and the distinction between direct speech and reported speech. Prototype validation control is performed through a perceptual experience, which shows a significant improvement in the quality of the synthesis.
76

Using phonetic knowledge in tools and resources for Natural Language Processing and Pronunciation Evaluation / Utilizando conhecimento fonético em ferramentas e recursos de Processamento de Língua Natural e Treino de Pronúncia

Almeida, Gustavo Augusto de Mendonça 21 March 2016 (has links)
This thesis presents tools and resources for the development of applications in Natural Language Processing and Pronunciation Training. There are four main contributions. First, a hybrid grapheme-to-phoneme converter for Brazilian Portuguese, named Aeiouadô, which makes use of both manual transcription rules and Classification and Regression Trees (CART) to infer the phone transcription. Second, a spelling correction system based on machine learning, which uses the trascriptions produced by Aeiouadô and is capable of handling phonologically-motivated errors, as well as contextual errors. Third, a method for the extraction of phonetically-rich sentences, which is based on greedy algorithms. Fourth, a prototype system for automatic pronunciation assessment, especially designed for Brazilian-accented English. / Esta dissertação apresenta recursos voltados para o desenvolvimento de aplicações de reconhecimento de fala e avaliação de pronúncia. São quatro as contribuições aqui discutidas. Primeiro, um conversor grafema-fonema híbrido para o Português Brasileiro, chamado Aeiouadô, o qual utiliza regras de transcrição fonética e Classification and Regression Trees (CART) para inferir os fones da fala. Segundo, uma ferramenta de correção automática baseada em aprendizado de máquina, que leva em conta erros de digitação de origem fonética, que é capaz de lidar com erros contextuais e emprega as transcrições geradas pelo Aeiouadô. Terceiro, um método para a extração de sentenças foneticamente-ricas, tendo em vista a criação de corpora de fala, baseado em algoritmos gulosos. Quarto, um protótipo de um sistema de reconhecimento e correção de fala não-nativa, voltado para o Inglês falado por aprendizes brasileiros.
77

Using phonetic knowledge in tools and resources for Natural Language Processing and Pronunciation Evaluation / Utilizando conhecimento fonético em ferramentas e recursos de Processamento de Língua Natural e Treino de Pronúncia

Gustavo Augusto de Mendonça Almeida 21 March 2016 (has links)
This thesis presents tools and resources for the development of applications in Natural Language Processing and Pronunciation Training. There are four main contributions. First, a hybrid grapheme-to-phoneme converter for Brazilian Portuguese, named Aeiouadô, which makes use of both manual transcription rules and Classification and Regression Trees (CART) to infer the phone transcription. Second, a spelling correction system based on machine learning, which uses the trascriptions produced by Aeiouadô and is capable of handling phonologically-motivated errors, as well as contextual errors. Third, a method for the extraction of phonetically-rich sentences, which is based on greedy algorithms. Fourth, a prototype system for automatic pronunciation assessment, especially designed for Brazilian-accented English. / Esta dissertação apresenta recursos voltados para o desenvolvimento de aplicações de reconhecimento de fala e avaliação de pronúncia. São quatro as contribuições aqui discutidas. Primeiro, um conversor grafema-fonema híbrido para o Português Brasileiro, chamado Aeiouadô, o qual utiliza regras de transcrição fonética e Classification and Regression Trees (CART) para inferir os fones da fala. Segundo, uma ferramenta de correção automática baseada em aprendizado de máquina, que leva em conta erros de digitação de origem fonética, que é capaz de lidar com erros contextuais e emprega as transcrições geradas pelo Aeiouadô. Terceiro, um método para a extração de sentenças foneticamente-ricas, tendo em vista a criação de corpora de fala, baseado em algoritmos gulosos. Quarto, um protótipo de um sistema de reconhecimento e correção de fala não-nativa, voltado para o Inglês falado por aprendizes brasileiros.
78

Implementing and Improving a Speech Synthesis System / Implementing and Improving a Speech Synthesis System

Beněk, Tomáš January 2014 (has links)
Tato práce se zabývá syntézou řeči z textu. V práci je podán základní teoretický úvod do syntézy řeči z textu. Práce je postavena na MARY TTS systému, který umožňuje využít existujících modulů k vytvoření vlastního systému pro syntézu řeči z textu, a syntéze řeči pomocí skrytých Markovových modelů natrénovaných na vytvořené řečové databázi. Bylo vytvořeno několik jednoduchých programů ulehčujících vytvoření databáze a přidání nového jazyka a hlasu pro MARY TTS systém bylo demonstrováno. Byl vytvořen a publikován modul a hlas pro Český jazyk. Byl popsán a implementován algoritmus pro přepis grafémů na fonémy.
79

Ett digitalt läromedel för barn med lässvårigheter

Eriksson, Ruth, Galaz Miranda, Luis January 2016 (has links)
Den digitala tidsåldern förändrar samhället. Ny teknik ger möjligheter att framställa och organisera kunskap på nya sätt. Tekniken som finns i skolan i dag, kan även utnyttjas till att optimera lästräningen till elever med lässvårigheter. Denna avhandling undersöker hur ett digitalt läromedel för läsinlärning för barn med lässvårigheter kan designas och implementeras, och visar att detta är möjligt att genomföra. Ett digitalt läromedel av bra kvalitet måste utgå ifrån en vetenskapligt vedertagen läsinlärningsmetod. Denna avhandling utgår ifrån Gunnel Wendicks modell, som redan används av många specialpedagoger. Modellen används dock i sin ursprungsform, med papperslistor med ord, utan datorer, surfplattor eller liknande. Vi analyserar Wendick-modellen, och tillämpar den på ett kreativt sätt för att designa en digital motsvarighet till det ursprungliga arbetssättet. Vårt mål är att skapa ett digitalt läromedel som implementerar Wendick-modellen, och på så sätt göra det möjligt att modellen används på olika smarta enheter. Med detta hoppas vi kunna underlätta arbetet både för specialpedagoger och barn med lässvårigheter, samt göra rutinerna mer tilltalande och kreativa. I vår studie undersöker vi olika tekniska möjligheter för att implementera Wendick-modellen. Vi väljer att skapa en prototyp av en webbapplikation, med passande funktionalitet för både administratörer, specialpedagoger och elever. Prototypens funktionalitet kan delas upp i två delar, den administrativa delen och övningsdelen. Den administrativa delen omfattar användargränssnitt och funktionalitet för hantering av elever och andra relevanta uppgifter. Övningsdelen omfattar övningsvyer och deras funktionalitet. Övningarnas funktionalitet är tänkt för att träna den auditiva kanalen, den fonologiska avkodningen - med målet att läsa rätt, samt den ortografiska avkodningen - med målet att eleven ska automatisera sin avkodning, d.v.s. att uppfatta orden som en bild. I utvecklandet av det digitala läromedlet används beprövade principer inom mjukvaruteknik och beprövade implementationstekniker. Man sammanställer högnivåkrav, modellerar domänen och definierar passande användningsfall. För att implementera applikationen används Java EE plattform, Web speech API, Primefaces specifikationer, och annat. Vår prototyp är en bra början som inspirerar till vidare utveckling, med förhoppning om att en fullständig webapplikation ska skapas, som ska förändra arbetssättet i våra skolor. / The digital age is changing society. New technology provides opportunities to produce and organize knowledge in new ways. The technology available in schools today can also be used to optimize literacy training for students with reading difficulties. This thesis examines how a digital teaching material for literacy training for children with reading difficulties can be designed and implemented, and shows that this is possible to achieve. A digital learning material of good quality should be based on a scientifically accepted method of literacy training. This thesis uses Gunnel Wendick’s training model which is already used by many special education teachers. The training model is used with word lists, without computers, tablets or the like. We analyze Wendick’s training model and employ it, in a creative way, to design a digital equivalent to the original model. Our goal is to create a digital learning material that implements Wendick’s training model, and thus make it possible to use in various smart devices. With this we hope to facilitate the work of both the special education teachers and children with reading difficulties and to make the procedures more appealing and creative. In our study, we examine various technical possibilities to implement Wendick’s training model. We choose to create a prototype of a web application, with suitable functionality for both administrators, special education teachers and students. The prototype’s functionality can be divided into two parts, the administrative part and the exercise part. The administrative part covers the user interface and functionality for handling students and other relevant data. The exercise part includes training views and their functionality. The functionality of the exercises is intended to train the auditory channel, the phonological awarenesswith the goal of reading accurately, and the orthographic decoding - with the goal that students should automate their decoding, that is, to perceive the words as an image. In the development of the digital teaching material, we used proven principles in software technologies and proven implementation techniques. It compiles high-level requirements, the domain model and defines the appropriate use cases. To implement the application, we used the Java EE platform, Web Speech API, Prime Faces specifications, and more. Our prototype is a good start to inspire further development, with the hope that a full web application will be created, that will transform the practices in our schools.
80

Förbättrat informationsflöde med hjälp av Augmented Reality

Almqvist, Daniel, Jansson, Magnus January 2015 (has links)
Augmented Reality är en teknik för att förstärka verkligheten, där digitala objekt placeras framför bilder eller liknande genom att använda kameran på den mobila enhet. Eftersom det finns flera olika metoder att använda Augmented Reality-tekniken har undersökningar och efterforskningar inom området gjorts. Ett exempel på ett område där denna teknik går att använda är reklam. Reklam är något som alla dagligen möts av, men oftast kan ses som tråkiga eller är något många inte lägger märke till. Genom en Augmented Reality prototyp kan användaren registrera respektive mönster eller tal och hämta nödvändig data från en databas. Sedan skapas en interaktiv händelse som visar informationen på ett unikt sätt, där alla, även de funktionshindrade kan ta del av den information de oftast saknar. Denna interaktiva händelse ger även liv till de tidigare tråkiga reklam- eller informationsaffischer. Resultatet av rapporten är en prototyp på mobila plattformen Android som använder Augmented Reality-tekniken och har många funktioner. Den kan acceptera röstigenkänning för att registrera det som talas in och utifrån specifika nyckelord kan prototypen ge information om nyckelordet. Testningen av denna prototyp visar att många är positiva i användningen av prototypen och ser det som ett intressant sätt att få ut informationen. Personerna som har testat prototypen kan tänka sig att använda prototypen själva för att få ut sin egna reklam på ett unikt och lockande sätt. / Augmented Reality is a technology where an object is introduced in front of a picture or a similar media using the camera on a mobile device. There are several different ways to use the Augmented Reality technology, research in the field has therefore been made. An example of an area where the technology can be used is advertisement. Since advertisement is something everyone is confronted with daily, but usually the advertisement can be seen as boring or is something many do not even notice. Through a Augmented Reality prototype, users can register both patterns and speech and get the required data from a database. It can create an interactive event that displays the information in a unique way, where everyone, even people with disabilities can take part of the information they usually can not take part of. This interactive event gives life to the previously tedious advertisement or information posters. The result of the report is a prototype on the mobile platform Android using Augmented Reality technology and the prototype has many features. It can use voice recognition and keywords to access additional information about the keyword. The testing of this prototype shows that many are in favour of the use of the prototype and they see it as an interesting way to get the information. That is why they are willing use the application themselves to get their own advertising in a unique and appealing way.

Page generated in 0.0551 seconds