Global ETD Search

41	Data Augmentation Approaches for Automatic Speech Recognition Using Text-to-Speech / 音声認識のための音声合成を用いたデータ拡張手法 Ueno, Sei 23 March 2022 (has links) 京都大学 / 新制・課程博士 / 博士(情報学) / 甲第24027号 / 情博第783号 / 新制\|\|情\|\|133(附属図書館) / 京都大学大学院情報学研究科知能情報学専攻 / (主査)教授河原達也, 教授黒橋禎夫, 教授西野恒 / 学位規則第4条第1項該当 / Doctor of Informatics / Kyoto University / DFAM Speech Recognition Data Augmentation Domain Adaptation Text-to-Speech Speech-to-Text 007
42	A Study of the Efficacy of Literacy-based Assistive Technology for Undergraduate Second Language Learners Yakimchuk, Daniel Thomas 01 January 2010 (has links) The goal of this study was to improve English language proficiency of undergraduate second-language learners (SLLs) through the use of literacy-based assistive technology (AT). Both current literature and the Universal Design for Learning (UDL) model suggest that literacy-based AT, while traditionally designed to assist students with learning disabilities, can benefit learners studying in a second language. This study adapted the Times Series Concurrent Differential (TSCD) research methodology to test the efficacy of AT for second language learners. TSCD involves the collection of a series of participant performance measurements both with and without the aid of AT. The difference between the two sets of measurements represents the impact of the AT. Fifty-four participants (32 SLL, 22 non-SLL) enrolled in a cross-section of Cape Breton University's Shannon School of Business courses participated. The adapted TSCD model was applied through a series of structured reading exercises that alternated use of AT with traditional reading over a full academic term. The reading assignments were drawn from course material and accounted for a small percentage of the class mark. In non-intervention exercises, participants read and reviewed assignments directly from printed course material. In intervention exercises, participants read and reviewed digital copies of the required material with the aid of PDF Equalizer. A secure Moodle site facilitated digital material access, performance measurement, and data management. A multivariate analysis of covariance (MANCOVA) determined a significant effect (9%) of the use of screen-reading software on academic performance of SLLs and a positive but insignificant effect (3%) of the use of screen-reading assistive technology on academic performance of non-SLLs. In addition, more SLL participants reported that the use of screen-reading software improved their reading (84%), listening (75%), and writing (56%) skills as compared to their non-SLL counterparts (36%, 41%, and 27% respectively). The majority of SLLs also reported that the use of the screen-reader had a positive effect on their academic performance (84%), improved their study skills (84%), and increased their confidence (78%) in their English language skills. assistive technology English for Academic Purposes ESL International Students Premier Assistive Technology text-to-speech Computer Sciences
43	Évaluation expérimentale d'un système statistique de synthèse de la parole, HTS, pour la langue française / Experimental evaluation of a statistical speech synthesis system, HTS, for french Le Maguer, Sébastien 05 July 2013 (has links) Les travaux présentés dans cette thèse se situent dans le cadre de la synthèse de la parole à partir du texte et, plus précisément, dans le cadre de la synthèse paramétrique utilisant des règles statistiques. Nous nous intéressons à l'influence des descripteurs linguistiques utilisés pour caractériser un signal de parole sur la modélisation effectuée dans le système de synthèse statistique HTS. Pour cela, deux méthodologies d'évaluation objective sont présentées. La première repose sur une modélisation de l'espace acoustique, généré par HTS par des mélanges gaussiens (GMM). En utilisant ensuite un ensemble de signaux de parole de référence, il est possible de comparer les GMM entre eux et ainsi les espaces acoustiques générés par les différentes configurations de HTS. La seconde méthodologie proposée repose sur le calcul de distances entre trames acoustiques appariées pour pouvoir évaluer la modélisation effectuée par HTS de manière plus locale. Cette seconde méthodologie permet de compléter les diverses analyses en contrôlant notamment les ensembles de données générées et évaluées. Les résultats obtenus selon ces deux méthodologies, et confirmés par des évaluations subjectives, indiquent que l'utilisation d'un ensemble complexe de descripteurs linguistiques n'aboutit pas nécessairement à une meilleure modélisation et peut s'avérer contre-productif sur la qualité du signal de synthèse produit. / The work presented in this thesis is about TTS speech synthesis and, more particularly, about statistical speech synthesis for French. We present an analysis on the impact of the linguistic contextual factors on the synthesis achieved by the HTS statistical speech synthesis system. To conduct the experiments, two objective evaluation protocols are proposed. The first one uses Gaussian mixture models (GMM) to represent the acoustical space produced by HTS according to a contextual feature set. By using a constant reference set of natural speech stimuli, GMM can be compared between themselves and consequently acoustic spaces generated by HTS. The second objective evaluation that we propose is based on pairwise distances between natural speech and synthetic speech generated by HTS. Results obtained by both protocols, and confirmed by subjective evaluations, show that using a large set of contextual factors does not necessarily improve the modeling and could be counter-productive on the speech quality. Informatique Traitement automatique de la parole Hts Computer science Speech processing Text-to-Speech synthesis Hts
44	Ellection markup language (EML) based tele-voting system Gong, XiangQi January 2009 (has links) Elections are one of the most fundamental activities of a democratic society. As is the case in any other aspect of life, developments in technology have resulted changes in the voting procedure from using the traditional paper-based voting to voting by use of electronic means, or e-voting. E-voting involves using different forms of electronic means like / voting machines, voting via the Internet, telephone, SMS and digital interactive television. This thesis concerns voting by telephone, or televoting, it starts by giving a brief overview and evaluation of various models and technologies that are implemented within such systems. The aspects of televoting that have been investigated are technologies that provide a voice interface to the voter and conduct the voting process, namely the Election Markup Language (EML), Automated Speech Recognition (ASR) and Text-to-Speech (TTS).
45	Improving High Quality Concatenative Text-to-Speech Using the Circular Linear Prediction Model Shukla, Sunil Ravindra 10 January 2007 (has links) Current high quality text-to-speech (TTS) systems are based on unit selection from a large database that is both contextually and prosodically rich. These systems, albeit capable of natural voice quality, are computationally expensive and require a very large footprint. Their success is attributed to the dramatic reduction of storage costs in recent times. However, for many TTS applications a smaller footprint is becoming a standard requirement. This thesis presents a new method for representing speech segments that can improve the quality and/or reduce the footprint current concatenative TTS systems. The circular linear prediction (CLP) model is revisited and combined with the constant pitch transform (CPT) to provide a robust representation of speech signals that allows for limited prosodic movements without a perceivable loss in quality. The CLP model assumes that each frame of voiced speech is an infinitely periodic signal. This assumption allows for LPC modeling using the covariance method, with the efficiency of the autocorrelation method. The CPT is combined with this model to provide a database that is uniform in pitch for matching the target prosody during synthesis. With this representation, limited prosody modifications and unit concatenation can be performed without causing audible artifacts. For resolving artifacts caused by pitch modifications in voicing transitions, a method has been introduced for reducing peakiness in the LP spectra by constraining the line spectral frequencies. Two experiments have been conducted to demonstrate the potential for the capabilities of CLP/CPT method. The first is a listening test to determine the ability of this model to realize prosody modifications without perceivable degradation. Utterances are resynthesized using the CLP/CPT method with emphasized prosodics to increase intelligibility in harsh environments. The second experiment compares the quality of utterances synthesized by unit-selection based limited-domain TTS against the CLP/CPT method. The results demonstrate that the CLP/CPT representation, applied to current concatenative TTS systems, can reduce the size of the database and increase the prosodic richness without noticeable degradation in voice quality. TTS Text-to-speech Speech synthesis Linear prediction Prosodic analysis (Linguistics) Speech synthesis Signal processing Digital techniques
46	Blind Estimation of Perceptual Quality for Modern Speech Communications Falk, Tiago 05 January 2009 (has links) Modern speech communication technologies expose users to perceptual quality degradations that were not experienced earlier with conventional telephone systems. Since perceived speech quality is a major contributor to the end user's perception of quality of service, speech quality estimation has become an important research field. In this dissertation, perceptual quality estimators are proposed for several emerging speech communication applications, in particular for i) wireless communications with noise suppression capabilities, ii) wireless-VoIP communications, iii) far-field hands-free speech communications, and iv) text-to-speech systems. First, a general-purpose speech quality estimator is proposed based on statistical models of normative speech behaviour and on innovative techniques to detect multiple signal distortions. The estimators do not depend on a clean reference signal hence are termed ``blind." Quality meters are then distributed along the network chain to allow for both quality degradations and quality enhancements to be handled. In order to improve estimation performance for wireless communications, statistical models of noise-suppressed speech are also incorporated. Next, a hybrid signal-and-link-parametric quality estimation paradigm is proposed for emerging wireless-VoIP communications. The algorithm uses VoIP connection parameters to estimate a base quality representative of the packet switching network. Signal-based distortions are then detected and quantified in order to adjust the base quality accordingly. The proposed hybrid methodology is shown to overcome the limitations of existing pure signal-based and pure link parametric algorithms. Temporal dynamics information is then investigated for quality diagnosis for hands-free speech communications. A spectro-temporal signal representation, where speech and reverberation tail components are shown to be separable, is used for blind characterization of room acoustics. In particular, estimators of reverberation time, direct-to-reverberation energy ratio, and reverberant speech quality are developed. Lastly, perceptual quality estimation for text-to-speech systems is addressed. Text- and speaker-independent hidden Markov models, trained on naturally produced speech, are used to capture normative spectral-temporal information. Deviations from the models, computed by means of a log-likelihood measure, are shown to be reliable indicators of multiple quality attributes including naturalness, fluency, and intelligibility. / Thesis (Ph.D, Electrical & Computer Engineering) -- Queen's University, 2008-12-22 14:54:49.28 Quality estimation Gaussian mixture models hidden Markov model modulation spectrum wireless communications wireless-VoIP reverberation text-to-speech
47	Ellection markup language (EML) based tele-voting system Gong, XiangQi January 2009 (has links) Elections are one of the most fundamental activities of a democratic society. As is the case in any other aspect of life, developments in technology have resulted changes in the voting procedure from using the traditional paper-based voting to voting by use of electronic means, or e-voting. E-voting involves using different forms of electronic means like / voting machines, voting via the Internet, telephone, SMS and digital interactive television. This thesis concerns voting by telephone, or televoting, it starts by giving a brief overview and evaluation of various models and technologies that are implemented within such systems. The aspects of televoting that have been investigated are technologies that provide a voice interface to the voter and conduct the voting process, namely the Election Markup Language (EML), Automated Speech Recognition (ASR) and Text-to-Speech (TTS).
48	The punctuation and intonation of parentheticals Bodenbender, Christel 17 May 2010 (has links) From a historical perspective, punctuation marks are often assumed to only represent some of the phonetic structure of the spoken form of that text. It has been argued recently that punctuation today is a linguistic system that not only represents some of the phonetic sentence structure but also syntactic as well as semantic information. One case in point is the observation that the semantic difference in differently punctuated parenthetical phrases is not reflected in the intonation contour. This study provides the acoustic evidence for this observation. Furthermore, this study makes recommendations to achieve natural-sounding text-to-speech output for English parentheticals by incorporating the study's findings with respect to parenthical intonation. The experiment conducted for this study involved three male and three female native speakers of Canadian English reading aloud a set of 20 sentences with parenthetical and non-parenthetical phrases. These sentences were analyzed with respect to acoustic characteristics due to differences in punctuation as well as due to differences between parenthetical and non-parenthetical phrases. A number of conclusions were drawn based on the results of the experiment: (1) a difference in punctuation, although entailing a semantic difference, is not reflected in the intonation pattern; (2) in contrast to the general understanding that parenthetical phrases are lower-leveled and narrower in pitch range than the surrounding sentence, this study shows that it is not the parenthetical phrase itself that is implemented differently from its non-parenthetical counterpart; rather, the phrase that precedes the parenthetical exhibits a lower baseline and with that a wider pitch range than the corresponding phrase in a non-parenthetical sentence; (3) sentences with two adjacent parenthetical phrases or one embedded in the other exhibit the same pattern for the parenthetical-preceding phrase as the sentences in (2) above and a narrowed pitch range for the parenthetical phrases that are not in the final position of the sequence of parentheticals; (4) no pausing pattern could be found; (5) the characteristics found for parenthetical phrases can be implemented in synthesized speech through the use of SABLE speech markup as part of the SABLE speech synthesis system. This is the first time that the connection between punctuation and intonation in parenthetical sentences has been investigated; it is also the first look at sentences with more than one parenthetical phrase. This study contributes to our understanding of the intonation of parenthetical phrases in English and their implementation in text-to-speech systems, by providing an analysis of their acoustic characteristics. Intonation Punctuation Parenthetical English Text-to-speech Synthesis Speech Markup Phonetics Linguistics
49	Ellection markup language (EML) based tele-voting system Gong, XiangQi January 2009 (has links) Magister Scientiae - MSc / Elections are one of the most fundamental activities of a democratic society. As is the case in any other aspect of life, developments in technology have resulted changes in the voting procedure from using the traditional paper-based voting to voting by use of electronic means, or e-voting. E-voting involves using different forms of electronic means like; voting machines, voting via the Internet, telephone, SMS and digital interactive television. This thesis concerns voting by telephone, or televoting, it starts by giving a brief overview and evaluation of various models and technologies that are implemented within such systems. The aspects of televoting that have been investigated are technologies that provide a voice interface to the voter and conduct the voting process, namely the Election Markup Language (EML), Automated Speech Recognition (ASR) and Text-to-Speech (TTS). / South Africa Automatic Speech Recognition (ASR) Text to Speech (TTS) Voting by voice
50	”Ett antal understreck ett antal understreck ett antal understreck” : En komparativ textanalys av en skriftlig respektive talsyntetiserad version av uppgiftsinstruktioner / ”A number of underlines a number of underlines a number of underlines” : A comparative text analysis of textual and linguistic features in a written and synthesized version of assignment instructions Nyberg, Madeleine January 2020 (has links) Syftet med föreliggande studie är att genom en komparativ funktionell textanalys jämföra textuella och språkliga drag i en skriftlig respektive talsyntetiserad version av svensklärares uppgiftsinstruktioner. Studiens teoretiska ramverk grundar sig på J.Ongs (1991) beskrivning av muntliga och skriftliga uttrycksformer där särskild vikt läggs på begreppet sekundär talspråklighet. Vidare grundas studien på en induktiv ansats.Studien bygger på kvalitativa data där empirin bestod av uppgiftsinstruktioner som är skrivna av verksamma svensklärare på gymnasial nivå. Samtliga uppgiftsinstruktioner blev upplästa och analyserade genom talsyntesprogrammet Oribi Speak. Resultatet visar att talsyntesen brister i läsflyt och beror främst på en avsaknad av punkt som finalt skiljetecken. Studiens slutsats är att talsyntesen kan ses göra anspråk på andra skrivregler än de som anses vara allmänt vedertagna. Höga krav ställs således på textförfattaren och förutsätter en medvetenhet kring hur en text bör utformas för att vara kompatibel med talsyntes. / The aim of this study has been to through a comparative functional text analysis comparing textual and linguistic features in a written and synthesized version of assignment instructions made by Swedish native teachers. Furthermore, the study is permeated by J. Ongs (1991) description of oral and written forms of expressions whereas the concept of secondary orality is of special importance.The study is based on a qualitative method, where the empirical data was collected by assignment instructions made by high school teachers educating in the subject Swedish. The empirical data consisted of assignment instructions where all were read and analyzed through the text-to-speech program Oribi Speak. The result shows that Text-To-Speech (TTS) presents deficiencies regarding reading flow and is mainly due to a lack of point and final punctuation. The study's conclusion is thereby that the TTS can be seen inferring to claim writing rules other than those considered to be generally valid. High demands are placed on the author of the text and requires an awareness of how a text should be designed to be compatible with text-to-speech programs. Text-to-speech text analysis Swedish high school school task instruction Talsyntes textanalys svenska gymnasiet uppgiftsinstruktion Languages and Literature Språk och litteratur

Search results