Spelling suggestions: "subject:"texttospeech"" "subject:"textualspeech""
61 |
Grapheme-to-phoneme conversion and its application to transliterationJiampojamarn, Sittichai 06 1900 (has links)
Grapheme-to-phoneme conversion (G2P) is the task of converting a word, represented by a sequence of graphemes, to its pronunciation, represented by a sequence of phonemes. The G2P task plays a crucial role in speech synthesis systems, and is an important part of other applications, including spelling correction and speech-to-speech machine translation. G2P conversion is a complex task, for which a number of diverse solutions have been proposed. In general, the problem is challenging because the source string does not unambiguously specify the target representation. In addition, the training data include only example word
pairs without the structural information of subword alignments.
In this thesis, I introduce several novel approaches for G2P conversion. My contributions can be categorized into (1) new alignment models and (2) new output generation models. With respect to alignment models, I present techniques including many-to-many alignment, phonetic-based alignment, alignment by integer linear programing and alignment-by-aggregation. Many-to-many alignment is designed to replace the one-to-one
alignment that has been used almost exclusively in the past. The new many-to-many alignments are more precise and accurate in expressing grapheme-phoneme relationships. The other proposed alignment approaches attempt to advance the training method beyond the use of Expectation-Maximization (EM). With respect to generation models, I first describe a framework for integrating many-to-many alignments and language models for grapheme classification. I then propose joint processing for G2P using online discriminative training. I integrate a generative joint n-gram model into the discriminative framework. Finally, I apply the proposed G2P systems to name transliteration generation and mining tasks. Experiments show that the proposed system achieves state-of-the-art performance in both the G2P and name transliteration tasks.
|
62 |
Elever med läs- och skrivsvårigheter och deras olika uppfattningar om användande av talsyntes / Students with reading and writing difficulties and their perceptions of the use of text-to-speechStengel, Marie January 2013 (has links)
Syftet med studien är att undersöka elever med läs- och skrivsvårigheters skilda sätt att uppfatta användandet av talsyntes. Kvalitativa intervjuer har genomförts med nio elever i grundskolans årskurs tre till nio. Studien har utgått från en fenomenografisk ansats. I resultatet framkommer sex skilda uppfattningar om användandet av talsyntes. De sex kategorierna är: talsyntesen i användning, viktiga andra, autonomi och självständighet, lärande, delaktighet och förändring samt engagemang och attityd. Majoriteten av eleverna upplever användandet av talsyntes positivt. Studien pekar på att talsyntesen ökar elevernas lärande, motivation och delaktighet hos de allra flesta av eleverna. Elever med läs- och skrivsvårigheter är en heterogen grupp med olika behov beroende av vad som orsakar deras svårigheter och talsyntesens betydelse och användningsområden kan därför variera. Studien visar att det är viktigt att införandet av talsyntes sker i dialog med eleven och att hon eller han har stora möjligheter att själv bestämma över när, hur och var den ska användas. Resultatet visar också att det är viktigt att det finns en god stöttning i början av användandet. / The aim of the study is to examine students with reading and writing difficulties different ways of perceiving the use of text-to speech. Qualitative interviews were conducted with nine students in the primary grades three to nine. The study was based on a phenomenographic approach. The result shows six different views on the use of text-to-speech. The six description categories are: text-to-speech in use, significant others, autonomy and independence, learning, participation and change and commitment and attitude. The majority of students experience the use of text-to-speech positively. The study indicates that text-to-speech increases student learning, motivation and participation of the vast majority of students. Students with reading and writing difficulties are a heterogeneous group with different needs depending on what is causing their difficulties and the importance and use of text-to-speech may therefore vary. It is important that the introduction of text-to-speech through discussion with the student and that she or he has great opportunity to decide when, how and where to use it. The result also shows that it is important with scaffolding at the beginning of use.
|
63 |
Grapheme-to-phoneme conversion and its application to transliterationJiampojamarn, Sittichai Unknown Date
No description available.
|
64 |
Automatic speech segmentation with limited data / by D.R. van NiekerkVan Niekerk, Daniel Rudolph January 2009 (has links)
The rapid development of corpus-based speech systems such as concatenative synthesis systems for
under-resourced languages requires an efficient, consistent and accurate solution with regard to phonetic speech segmentation. Manual development of phonetically annotated corpora is a time consuming and expensive process which suffers from challenges regarding consistency and reproducibility,
while automation of this process has only been satisfactorily demonstrated on large corpora of a select
few languages by employing techniques requiring extensive and specialised resources.
In this work we considered the problem of phonetic segmentation in the context of developing small prototypical speech synthesis corpora for new under-resourced languages. This was done
through an empirical evaluation of existing segmentation techniques on typical speech corpora in three
South African languages. In this process, the performance of these techniques were characterised under different data conditions and the efficient application of these techniques were investigated in
order to improve the accuracy of resulting phonetic alignments.
We found that the application of baseline speaker-specific Hidden Markov Models results in relatively robust and accurate alignments even under extremely limited data conditions and demonstrated
how such models can be developed and applied efficiently in this context. The result is segmentation
of sufficient quality for synthesis applications, with the quality of alignments comparable to manual
segmentation efforts in this context. Finally, possibilities for further automated refinement of phonetic alignments were investigated and an efficient corpus development strategy was proposed with
suggestions for further work in this direction. / Thesis (M.Ing. (Computer Engineering))--North-West University, Potchefstroom Campus, 2009.
|
65 |
Automatic speech segmentation with limited data / by D.R. van NiekerkVan Niekerk, Daniel Rudolph January 2009 (has links)
The rapid development of corpus-based speech systems such as concatenative synthesis systems for
under-resourced languages requires an efficient, consistent and accurate solution with regard to phonetic speech segmentation. Manual development of phonetically annotated corpora is a time consuming and expensive process which suffers from challenges regarding consistency and reproducibility,
while automation of this process has only been satisfactorily demonstrated on large corpora of a select
few languages by employing techniques requiring extensive and specialised resources.
In this work we considered the problem of phonetic segmentation in the context of developing small prototypical speech synthesis corpora for new under-resourced languages. This was done
through an empirical evaluation of existing segmentation techniques on typical speech corpora in three
South African languages. In this process, the performance of these techniques were characterised under different data conditions and the efficient application of these techniques were investigated in
order to improve the accuracy of resulting phonetic alignments.
We found that the application of baseline speaker-specific Hidden Markov Models results in relatively robust and accurate alignments even under extremely limited data conditions and demonstrated
how such models can be developed and applied efficiently in this context. The result is segmentation
of sufficient quality for synthesis applications, with the quality of alignments comparable to manual
segmentation efforts in this context. Finally, possibilities for further automated refinement of phonetic alignments were investigated and an efficient corpus development strategy was proposed with
suggestions for further work in this direction. / Thesis (M.Ing. (Computer Engineering))--North-West University, Potchefstroom Campus, 2009.
|
66 |
Hlasem ovládaný elektronický zubní kříž / Voice controled electronic health record in dentistryHippmann, Radek January 2012 (has links)
Title: Voice controlled electronic health record in dentistry Author: MUDr. Radek Hippmann Department: Department of paediatric stomatology, Faculty hospital Motol Supervisor: Prof. MUDr. Taťjana Dostalová, DrSc., MBA Supervisor's e-mail: Tatjana.Dostalova@fnmotol.cz This PhD thesis is concerning with development of the complex electronic health record (EHR) for the field of dentistry. This system is also enhanced with voice control based on the Automatic speech recognition (ASR) system and module for speech synthesis Text-to- speech (TTS). In the first part of the thesis is described the whole issue and are defined particular areas, whose combination is essential for EHR system creation in this field. It is mainly basic delimiting of terms and areas in the dentistry. In the next step we are engaged in temporomandibular joint (TMJ) problematic, which is often ignored and trends in EHR and voice technologies are also described. In the methodological part are described delineated technologies used during the EHR system creation, voice recognition and TMJ disease classification. Following part incorporates results description, which are corresponding with the knowledge base in dentistry and TMJ. From this knowledge base originates the graphic user interface DentCross, which is serving for dental data...
|
67 |
Tradução grafema-fonema para a língua portuguesa baseada em autômatos adaptativos. / Grapheme-phoneme translation for portuguese based on adaptive automata.Danilo Picagli Shibata 25 March 2008 (has links)
Este trabalho apresenta um estudo sobre a utilização de dispositivos adaptativos para realizar tradução texto-voz. O foco do trabalho é a criação de um método para a tradução grafema-fonema para a língua portuguesa baseado em autômatos adaptativos e seu uso em um software de tradução texto-voz. O método apresentado busca mimetizar o comportamento humano no tratamento de regras de tonicidade, separação de sílabas e as influências que as sílabas exercem sobre suas vizinhas. Essa característica torna o método facilmente utilizável para outras variações da língua portuguesa, considerando que essas características são invariantes em relação à localidade e a época da variedade escolhida. A variação contemporânea da língua falada na cidade de São Paulo foi escolhida como alvo de análise e testes neste trabalho. Para essa variação, o modelo apresenta resultados satisfatórios superando 95% de acerto na tradução grafema-fonema de palavras, chegando a 90% de acerto levando em consideração a resolução de dúvidas geradas por palavras que podem possuir duas representações sonoras e gerando uma saída sonora inteligível aos nativos da língua por meio da síntese por concatenação baseada em sílabas. Como resultado do trabalho, além do modelo para tradução grafema-fonema de palavras baseado em autômatos adaptativos, foi criado um método para escolha da representação fonética correta em caso de ambigüidade e foram criados dois softwares, um para simulação de autômatos adaptativos e outro para a tradução grafema-fonema de palavras utilizando o modelo de tradução criado e o método de escolha da representação correta. Esse último software foi unificado ao sintetizador desenvolvido por Koike et al. (2007) para a criação de um tradutor texto-voz para a língua portuguesa. O trabalho mostra a viabilidade da utilização de autômatos adaptativos como base ou como um elemento auxiliar para o processo de tradução texto-voz na língua portuguesa. / This work presents a study on the use of adaptive devices for text-to-speech translation. The work focuses on the development of a grapheme-phoneme translation method for Portuguese based on Adaptive Automata and the use of this method in a text-to-speech translation software. The presented method resembles human behavior when handling syllable separation rules, syllable stress definition and influences syllables have on each other. This feature makes the method easy to use with different variations of Portuguese, since these characteristics are invariants of the language. Portuguese spoken nowadays in São Paulo, Brazil has been chosen as the target for analysis and tests in this work. The method has good results for such variation of Portuguese, reaching 95% accuracy rate for grapheme-phoneme translation, clearing the 90% mark after resolution of ambiguous cases in which different representations are accepted for a grapheme and generating phonetic output intelligible for native speakers based on concatenation synthesis using syllables as concatenation units. As final results of this work, a model is presented for grapheme-phoneme translation for Portuguese words based on Adaptive Automata, a methodology to choose the correct phonetic representation for the grapheme in ambiguous cases, a software for Adaptive Automata simulation and a software for grapheme-phoneme translation of texts using both the model of translation and methodology for disambiguation. The latter software was unified with the speech synthesizer developed by Koike et al. (2007) to create a text-to-speech translator for Portuguese. This work evidences the feasibility of text-to-speech translation for Portuguese using Adaptive Automata as the main instrument for such task.
|
68 |
A Research Bed For Unit Selection Based Text To Speech Synthesis SystemKonakanchi, Parthasarathy 02 1900 (has links) (PDF)
After trying Festival Speech Synthesis System, we decided to develop our own TTS framework, conducive to perform the necessary research experiments for developing good quality TTS for Indian languages. In most of the attempts on Indian language TTS, there is no prosody model, provision for handling foreign language words and no phrase break prediction leading to the possibility of introducing appropriate pauses in the synthesized speech. Further, in the Indian context, there is a real felt need for a bilingual TTS, involving English, along with the Indian language. In fact, it may be desirable to also have a trilingual TTS, which can also take care of the language of the neighboring state or Hindi, in addition. Thus, there is a felt need for a full-fledged TTS development framework, which lends itself for experimentation involving all the above issues and more.
This thesis work is therefore such a serious attempt to develop a modular, unit selection based TTS framework. The developed system has been tested for its effectiveness to create intelligible speech in Tamil and Kannada. The created system has also been used to carry out two research experiments on TTS.
The first part of the work is the design and development of corpus-based concatenative Tamil speech synthesizer in Matlab and C. A synthesis database has been created with 1027 phonetically rich, pre-recorded sentences, segmented at the phone level. From the sentence to be synthesized, specifications of the required target units are predicted. During synthesis, database units are selected that best match the target specification according to a distance metric and a concatenation quality metric. To accelerate matching, the features of the end frames of the database units have been precomputed and stored. The selected units are concatenated to produce synthetic speech. The high values of the obtained mean opinion scores for the TTS output reveal that speech synthesized using our TTS is intelligible and acceptably natural and can possibly be put to commercial use with some additional features. Experiments carried out by others using my TTS framework have shown that, whenever the required phonetic context is not available in the synthesis database., similar phones that are perceptually indistinguishable may be substituted.
The second part of the work deals with the design and modification of the developed TTS framework to be embedded in mobile phones. Commercial GSM FR, EFR and AMR speech codecs are used for compressing our synthesis database. Perception experiments reveal that speech synthesized using a highly compressed database is reasonably natural. This holds promise in the future to read SMSs and emails on mobile phones in Indian languages. Finally, we observe that incorporating prosody and pause models for Indian language TTS would further enhance the quality of the synthetic speech. These are some of the potential, unexplored areas ahead, for research in speech synthesis in Indian languages.
|
69 |
Evaluating Multi-Uav System with Text to Spech for Sitational Awarness and WorkloadLindgren, Viktor January 2021 (has links)
With improvements to miniaturization technologies, the ratio between operators required per UAV has become increasingly smaller at the cost of increased workload. Workload is an important factor to consider when designing the multi-UAV systems of tomorrow as too much workload may decrease an operator's performance. This study proposes the use of text to speech combined with an emphasis on a single screen design as a way of improving situational awareness and perceived workload. A controlled experiment consisting of 18 participants was conducted inside a simulator. Their situational awareness and perceived workload was measured using SAGAT and NASA-TLX respectively. The results show that the use of text to speech lead to a decrease in situational awareness for all elements inside the graphical user interface that were not directly handled by a text to speech event. All of the NASA-TLX measurements showed an improvement in perceived workload except for physical demand. Overall an improvement of perceived workload was observed when text to speech was in use.
|
70 |
Hlasem ovládaný elektronický zubní kříž / Voice controled electronic health record in dentistryHippmann, Radek January 2012 (has links)
Title: Voice controlled electronic health record in dentistry Author: MUDr. Radek Hippmann Department: Department of paediatric stomatology, Faculty hospital Motol Supervisor: Prof. MUDr. Taťjana Dostalová, DrSc., MBA Supervisor's e-mail: Tatjana.Dostalova@fnmotol.cz This PhD thesis is concerning with development of the complex electronic health record (EHR) for the field of dentistry. This system is also enhanced with voice control based on the Automatic speech recognition (ASR) system and module for speech synthesis Text-to- speech (TTS). In the first part of the thesis is described the whole issue and are defined particular areas, whose combination is essential for EHR system creation in this field. It is mainly basic delimiting of terms and areas in the dentistry. In the next step we are engaged in temporomandibular joint (TMJ) problematic, which is often ignored and trends in EHR and voice technologies are also described. In the methodological part are described delineated technologies used during the EHR system creation, voice recognition and TMJ disease classification. Following part incorporates results description, which are corresponding with the knowledge base in dentistry and TMJ. From this knowledge base originates the graphic user interface DentCross, which is serving for dental data...
|
Page generated in 0.0505 seconds