• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 88
  • 16
  • 6
  • 5
  • 4
  • 4
  • 4
  • 4
  • 4
  • 4
  • 4
  • 3
  • 3
  • 2
  • 2
  • Tagged with
  • 181
  • 181
  • 61
  • 38
  • 38
  • 35
  • 33
  • 33
  • 20
  • 19
  • 18
  • 17
  • 14
  • 14
  • 13
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
91

Mathematical modelling of some aspects of stressing a Lithuanian text / Kai kurių lietuvių kalbos teksto kirčiavimo aspektų matematinis modeliavimas

Anbinderis, Tomas 02 July 2010 (has links)
The present dissertation deals with one of the speech synthesizer components – automatic stressing of a text and two other goals relating to it – homographs (words that can be stressed in several ways) disambiguation and a search for clitics (unstressed words). The method, which by means of decision trees finds sequences of letters that unambiguously define the word stressing, was applied to stress a Lithuanian text. Decision trees were created using large corpus of stressed words. Stressing rules based on sequences of letters at the beginning, ending and in the middle of a word have been formulated. The algorithm proposed reaches the accuracy of about 95.5%. The homograph disambiguation algorithm proposed by the present author is based on frequencies of lexemes and morphological features, that were obtained from corpus containing about one million words. Such methods were not used for Lithuanian language so far. The proposed algorithm enables to select the correct variant of stressing within the accuracy of 85.01%. Besides the author proposes methods of four types to search for the clitics in a Lithuanian text: methods based on recognising the combinational forms, based on statistical stressed/unstressed frequency of a word, grammar rules and stressing of the adjacent words. It is explained how to unite all the methods into a single algorithm. 4.1% of errors was obtained for the testing data among all the words, and the ratio of errors and unstressed words accounts for 18... [to full text] / Disertacijoje nagrinėjama viena iš kalbos sintezatoriaus sudedamųjų dalių – teksto automatinis kirčiavimas, bei su kirčiavimu susiję kiti uždaviniai: vienodai rašomų, bet skirtingai tariamų, žodžių (homografų) vienareikšminimas bei prie gretimo žodžio prišlijusių bekirčių žodžių (klitikų) paieška. Teksto kirčiavimui pritaikytas metodas, kuris naudodamas sprendimų medžius randa raidžių sekas, vienareikšmiai nusakančias žodžio kirčiavimą. Sprendimo medžiams sudaryti buvo naudojamas didelies apimties sukirčiuotų žodžių tekstynas. Buvo sudarytos kirčiavimo taisyklės remiantis raidžių sekomis žodžių pradžioje, pabaigoje ir viduryje. Pasiūlytas kirčiavimo algoritmas pasiekia apie 95,5% tikslumą. Homografams vienareikšminti pritaikyti iki šiol lietuvių kalbai nenaudoti metodai, pagrįsti leksemų ir morfologinių pažymų vartosenos dažniais, gautais iš vieno milijono žodžių tekstyno. Darbe parodyta, kad morfologinių pažymų dažniai yra svarbesni už leksemų dažnius. Pasiūlyti metodai leido homografus vienareikšminti 85,01% tikslumu. Klitikų paieškai pasiūlyti metodai, kurie remiasi: 1) samplaikinių formų atpažinimu, 2) statistiniu žodžio kirčiavimo/nekirčiavimo dažniu, 3) kai kuriomis gramatikos taisyklėmis bei 4) gretimų žodžių kirčių pasiskirstymu (ritmika). Paaiškinta, kaip visus metodus sujungti į vieną algoritmą. Pritaikius šį algoritmą testavimo duomenims, klaidų ir visų žodžių santykis buvo 4,1%, o klaidų ir nekirčiuotų žodžių santykis – 18,8%.
92

Kai kurių lietuvių kalbos teksto kirčiavimo aspektų matematinis modeliavimas / Mathematical modelling of some aspects of stressing a Lithuanian text

Anbinderis, Tomas 02 July 2010 (has links)
Disertacijoje nagrinėjama viena iš kalbos sintezatoriaus sudedamųjų dalių – teksto automatinis kirčiavimas, bei su kirčiavimu susiję kiti uždaviniai: vienodai rašomų, bet skirtingai tariamų, žodžių (homografų) vienareikšminimas bei prie gretimo žodžio prišlijusių bekirčių žodžių (klitikų) paieška. Teksto kirčiavimui pritaikytas metodas, kuris naudodamas sprendimų medžius randa raidžių sekas, vienareikšmiai nusakančias žodžio kirčiavimą. Sprendimo medžiams sudaryti buvo naudojamas didelies apimties sukirčiuotų žodžių tekstynas. Buvo sudarytos kirčiavimo taisyklės remiantis raidžių sekomis žodžių pradžioje, pabaigoje ir viduryje. Pasiūlytas kirčiavimo algoritmas pasiekia apie 95,5% tikslumą. Homografams vienareikšminti pritaikyti iki šiol lietuvių kalbai nenaudoti metodai, pagrįsti leksemų ir morfologinių pažymų vartosenos dažniais, gautais iš vieno milijono žodžių tekstyno. Darbe parodyta, kad morfologinių pažymų dažniai yra svarbesni už leksemų dažnius. Pasiūlyti metodai leido homografus vienareikšminti 85,01% tikslumu. Klitikų paieškai pasiūlyti metodai, kurie remiasi: 1) samplaikinių formų atpažinimu, 2) statistiniu žodžio kirčiavimo/nekirčiavimo dažniu, 3) kai kuriomis gramatikos taisyklėmis bei 4) gretimų žodžių kirčių pasiskirstymu (ritmika). Paaiškinta, kaip visus metodus sujungti į vieną algoritmą. Pritaikius šį algoritmą testavimo duomenims, klaidų ir visų žodžių santykis buvo 4,1%, o klaidų ir nekirčiuotų žodžių santykis – 18,8%. / The present dissertation deals with one of the speech synthesizer components – automatic stressing of a text and two other goals relating to it – homographs (words that can be stressed in several ways) disambiguation and a search for clitics (unstressed words). The method, which by means of decision trees finds sequences of letters that unambiguously define the word stressing, was applied to stress a Lithuanian text. Decision trees were created using large corpus of stressed words. Stressing rules based on sequences of letters at the beginning, ending and in the middle of a word have been formulated. The algorithm proposed reaches the accuracy of about 95.5%. The homograph disambiguation algorithm proposed by the present author is based on frequencies of lexemes and morphological features, that were obtained from corpus containing about one million words. Such methods were not used for Lithuanian language so far. The proposed algorithm enables to select the correct variant of stressing within the accuracy of 85.01%. Besides the author proposes methods of four types to search for the clitics in a Lithuanian text: methods based on recognising the combinational forms, based on statistical stressed/unstressed frequency of a word, grammar rules and stressing of the adjacent words. It is explained how to unite all the methods into a single algorithm. 4.1% of errors was obtained for the testing data among all the words, and the ratio of errors and unstressed words accounts for 18.8%... [to full text]
93

Advanced natural language processing for improved prosody in text-to-speech synthesis / G. I. Schlünz

Schlünz, Georg Isaac January 2014 (has links)
Text-to-speech synthesis enables the speech-impeded user of an augmentative and alternative communication system to partake in any conversation on any topic, because it can produce dynamic content. Current synthetic voices do not sound very natural, however, lacking in the areas of emphasis and emotion. These qualities are furthermore important to convey meaning and intent beyond that which can be achieved by the vocabulary of words only. Put differently, speech synthesis requires a more comprehensive analysis of its text input beyond the word level to infer the meaning and intent that elicit emphasis and emotion. The synthesised speech then needs to imitate the effects that these textual factors have on the acoustics of human speech. This research addresses these challenges by commencing with a literature study on the state of the art in the fields of natural language processing, text-to-speech synthesis and speech prosody. It is noted that the higher linguistic levels of discourse, information structure and affect are necessary for the text analysis to shape the prosody appropriately for more natural synthesised speech. Discourse and information structure account for meaning, intent and emphasis, and affect formalises the modelling of emotion. The OCC model is shown to be a suitable point of departure for a new model of affect that can leverage the higher linguistic levels. The audiobook is presented as a text and speech resource for the modelling of discourse, information structure and affect because its narrative structure is prosodically richer than the random constitution of a traditional text-to-speech corpus. A set of audiobooks are selected and phonetically aligned for subsequent investigation. The new model of discourse, information structure and affect, called e-motif, is developed to take advantage of the audiobook text. It is a subjective model that does not specify any particular belief system in order to appraise its emotions, but defines only anonymous affect states. Its cognitive and social features rely heavily on the coreference resolution of the text, but this process is found not to be accurate enough to produce usable features values. The research concludes with an experimental investigation of the influence of the e-motif features on human speech and synthesised speech. The aligned audiobook speech is inspected for prosodic correlates of the cognitive and social features, revealing that some activity occurs in the into national domain. However, when the aligned audiobook speech is used in the training of a synthetic voice, the e-motif effects are overshadowed by those of structural features that come standard in the voice building framework. / PhD (Information Technology), North-West University, Vaal Triangle Campus, 2014
94

Unsupervised learning for text-to-speech synthesis

Watts, Oliver Samuel January 2013 (has links)
This thesis introduces a general method for incorporating the distributional analysis of textual and linguistic objects into text-to-speech (TTS) conversion systems. Conventional TTS conversion uses intermediate layers of representation to bridge the gap between text and speech. Collecting the annotated data needed to produce these intermediate layers is a far from trivial task, possibly prohibitively so for languages in which no such resources are in existence. Distributional analysis, in contrast, proceeds in an unsupervised manner, and so enables the creation of systems using textual data that are not annotated. The method therefore aids the building of systems for languages in which conventional linguistic resources are scarce, but is not restricted to these languages. The distributional analysis proposed here places the textual objects analysed in a continuous-valued space, rather than specifying a hard categorisation of those objects. This space is then partitioned during the training of acoustic models for synthesis, so that the models generalise over objects' surface forms in a way that is acoustically relevant. The method is applied to three levels of textual analysis: to the characterisation of sub-syllabic units, word units and utterances. Entire systems for three languages (English, Finnish and Romanian) are built with no reliance on manually labelled data or language-specific expertise. Results of a subjective evaluation are presented.
95

Adaptive threshold optimisation for colour-based lip segmentation in automatic lip-reading systems

Gritzman, Ashley Daniel January 2016 (has links)
A thesis submitted to the Faculty of Engineering and the Built Environment, University of the Witwatersrand, Johannesburg, in ful lment of the requirements for the degree of Doctor of Philosophy. Johannesburg, September 2016 / Having survived the ordeal of a laryngectomy, the patient must come to terms with the resulting loss of speech. With recent advances in portable computing power, automatic lip-reading (ALR) may become a viable approach to voice restoration. This thesis addresses the image processing aspect of ALR, and focuses three contributions to colour-based lip segmentation. The rst contribution concerns the colour transform to enhance the contrast between the lips and skin. This thesis presents the most comprehensive study to date by measuring the overlap between lip and skin histograms for 33 di erent colour transforms. The hue component of HSV obtains the lowest overlap of 6:15%, and results show that selecting the correct transform can increase the segmentation accuracy by up to three times. The second contribution is the development of a new lip segmentation algorithm that utilises the best colour transforms from the comparative study. The algorithm is tested on 895 images and achieves percentage overlap (OL) of 92:23% and segmentation error (SE) of 7:39 %. The third contribution focuses on the impact of the histogram threshold on the segmentation accuracy, and introduces a novel technique called Adaptive Threshold Optimisation (ATO) to select a better threshold value. The rst stage of ATO incorporates -SVR to train the lip shape model. ATO then uses feedback of shape information to validate and optimise the threshold. After applying ATO, the SE decreases from 7:65% to 6:50%, corresponding to an absolute improvement of 1:15 pp or relative improvement of 15:1%. While this thesis concerns lip segmentation in particular, ATO is a threshold selection technique that can be used in various segmentation applications. / MT2017
96

Phase estimation with application to speech analysis-synthesis

Quatieri, T. F. (Thomas F.) January 1980 (has links)
Thesis (Sc.D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1980. / MICROFICHE COPY AVAILABLE IN ARCHIVES AND ENGINEERING. / Vita. / Includes bibliographical references. / by Thomas F. Quatieri, Jr. / Sc.D.
97

Prosody analysis and modeling for Cantonese text-to-speech.

January 2003 (has links)
Li Yu Jia. / Thesis (M.Phil.)--Chinese University of Hong Kong, 2003. / Includes bibliographical references. / Abstracts in English and Chinese. / Chapter Chapter 1 --- Introduction --- p.1 / Chapter 1.1. --- TTS Technology --- p.1 / Chapter 1.2. --- Prosody --- p.2 / Chapter 1.2.1. --- What is Prosody --- p.2 / Chapter 1.2.2. --- Prosody from Different Perspectives --- p.3 / Chapter 1.2.3. --- Acoustical Parameters of Prosody --- p.3 / Chapter 1.2.4. --- Prosody in TTS --- p.5 / Chapter 1.2.4.1 --- Analysis --- p.5 / Chapter 1.2.4.2 --- Modeling --- p.6 / Chapter 1.2.4.3 --- Evaluation --- p.6 / Chapter 1.3. --- Thesis Objectives --- p.7 / Chapter 1.4. --- Thesis Outline --- p.7 / Reference --- p.8 / Chapter Chapter 2 --- Cantonese --- p.9 / Chapter 2.1. --- The Cantonese Dialect --- p.9 / Chapter 2.1.1. --- Phonology --- p.10 / Chapter 2.1.1.1 --- Initial --- p.11 / Chapter 2.1.1.2 --- Final --- p.12 / Chapter 2.1.1.3 --- Tone --- p.13 / Chapter 2.1.2. --- Phonological Constraints --- p.14 / Chapter 2.2. --- Tones in Cantonese --- p.15 / Chapter 2.2.1. --- Tone System --- p.15 / Chapter 2.2.2. --- Linguistic Significance --- p.18 / Chapter 2.2.3. --- Acoustical Realization --- p.18 / Chapter 2.3. --- Prosodic Variation in Continuous Cantonese Speech --- p.20 / Chapter 2.4. --- Cantonese Speech Corpus - CUProsody --- p.21 / Reference --- p.23 / Chapter Chapter 3 --- F0 Normalization --- p.25 / Chapter 3.1. --- F0 in Speech Production --- p.25 / Chapter 3.2. --- F0 Extraction --- p.27 / Chapter 3.3. --- Duration-normalized Tone Contour --- p.29 / Chapter 3.4. --- F0 Normalization --- p.30 / Chapter 3.4.1. --- Necessity and Motivation --- p.30 / Chapter 3.4.2. --- F0 Normalization --- p.33 / Chapter 3.4.2.1 --- Methodology --- p.33 / Chapter 3.4.2.2 --- Assumptions --- p.34 / Chapter 3.4.2.3 --- Estimation of Relative Tone Ratios --- p.35 / Chapter 3.4.2.4 --- Derivation of Phrase Curve --- p.37 / Chapter 3.4.2.5 --- Normalization of Absolute FO Values --- p.39 / Chapter 3.4.3. --- Experiments and Discussion --- p.39 / Chapter 3.5. --- Conclusions --- p.44 / Reference --- p.45 / Chapter Chapter 4 --- Acoustical FO Analysis --- p.48 / Chapter 4.1. --- Methodology of FO Analysis --- p.48 / Chapter 4.1.1. --- Analysis-by-Synthesis --- p.48 / Chapter 4.1.2. --- Acoustical Analysis --- p.51 / Chapter 4.2. --- Acoustical FO Analysis for Cantonese --- p.52 / Chapter 4.2.1. --- Analysis of Phrase Curves --- p.52 / Chapter 4.2.2. --- Analysis of Tone Contours --- p.55 / Chapter 4.2.2.1 --- Context-independent Single-tone Contours --- p.56 / Chapter 4.2.2.2 --- Contextual Variation --- p.58 / Chapter 4.2.2.3 --- Co-articulated Tone Contours of Disyllabic Word --- p.59 / Chapter 4.2.2.4 --- Cross-word Contours --- p.62 / Chapter 4.2.2.5 --- Phrase-initial Tone Contours --- p.65 / Chapter 4.3. --- Summary --- p.66 / Reference --- p.67 / Chapter Chapter5 --- Prosody Modeling for Cantonese Text-to-Speech --- p.70 / Chapter 5.1. --- Parametric Model and Non-parametric Model --- p.70 / Chapter 5.2. --- Cantonese Text-to-Speech: Baseline System --- p.72 / Chapter 5.2.1. --- Sub-syllable Unit --- p.72 / Chapter 5.2.2. --- Text Analysis Module --- p.73 / Chapter 5.2.3. --- Acoustical Synthesis --- p.74 / Chapter 5.2.4. --- Prosody Module --- p.74 / Chapter 5.3. --- Enhanced Prosody Model --- p.74 / Chapter 5.3.1. --- Modeling Tone Contours --- p.75 / Chapter 5.3.1.1 --- Word-level FO Contours --- p.76 / Chapter 5.3.1.2 --- Phrase-initial Tone Contours --- p.77 / Chapter 5.3.1.3 --- Tone Contours at Word Boundary --- p.78 / Chapter 5.3.2. --- Modeling Phrase Curves --- p.79 / Chapter 5.3.3. --- Generation of Continuous FO Contours --- p.81 / Chapter 5.4. --- Summary --- p.81 / Reference --- p.82 / Chapter Chapter 6 --- Performance Evaluation --- p.83 / Chapter 6.1. --- Introduction to Perceptual Test --- p.83 / Chapter 6.1.1. --- Aspects of Evaluation --- p.84 / Chapter 6.1.2. --- Methods of Judgment Test --- p.84 / Chapter 6.1.3. --- Problems in Perceptual Test --- p.85 / Chapter 6.2. --- Perceptual Tests for Cantonese TTS --- p.86 / Chapter 6.2.1. --- Intelligibility Tests --- p.86 / Chapter 6.2.1.1 --- Method --- p.86 / Chapter 6.2.1.2 --- Results --- p.88 / Chapter 6.2.1.3 --- Analysis --- p.89 / Chapter 6.2.2. --- Naturalness Tests --- p.90 / Chapter 6.2.2.1 --- Word-level --- p.90 / Chapter 6.2.2.1.1 --- Method --- p.90 / Chapter 6.2.2.1.2 --- Results --- p.91 / Chapter 6.2.3.1.3 --- Analysis --- p.91 / Chapter 6.2.2.2 --- Sentence-level --- p.92 / Chapter 6.2.2.2.1 --- Method --- p.92 / Chapter 6.2.2.2.2 --- Results --- p.93 / Chapter 6.2.2.2.3 --- Analysis --- p.94 / Chapter 6.3. --- Conclusions --- p.95 / Chapter 6.4. --- Summary --- p.95 / Reference --- p.96 / Chapter Chapter 7 --- Conclusions and Future Work --- p.97 / Chapter 7.1. --- Conclusions --- p.97 / Chapter 7.2. --- Suggested Future Work --- p.99 / Appendix --- p.100 / Appendix 1 Linear Regression --- p.100 / Appendix 2 36 Templates of Cross-word Contours --- p.101 / Appendix 3 Word List for Word-level Tests --- p.102 / Appendix 4 Syllable Occurrence in Word List of Intelligibility Test --- p.108 / Appendix 5 Wrongly Identified Word List --- p.112 / Appendix 6 Confusion Matrix --- p.115 / Appendix 7 Unintelligible Word List --- p.117 / Appendix 8 Noisy Word List --- p.119 / Appendix 9 Sentence List for Naturalness Test --- p.120
98

Phase estimation with application to speech analysis-synthesis

January 1979 (has links)
Thomas F. Quatieri, Jr. / Originally published as thesis (Dept. of Electrical Engineering and Computer Science, Sc.D., 1979). / Bibliography: p. 133-135. / Supported in part by the Advanced Research Projects Agency (monitored by ONR) under Contract N00014-75-C-0951 NR 409-328
99

Talking Heads - Models and Applications for Multimodal Speech Synthesis

Beskow, Jonas January 2003 (has links)
This thesis presents work in the area of computer-animatedtalking heads. A system for multimodal speech synthesis hasbeen developed, capable of generating audiovisual speechanimations from arbitrary text, using parametrically controlled3D models of the face and head. A speech-specific directparameterisation of the movement of the visible articulators(lips, tongue and jaw) is suggested, along with a flexiblescheme for parameterising facial surface deformations based onwell-defined articulatory targets. To improve the realism and validity of facial and intra-oralspeech movements, measurements from real speakers have beenincorporated from several types of static and dynamic datasources. These include ultrasound measurements of tonguesurface shape, dynamic optical motion tracking of face pointsin 3D, as well as electromagnetic articulography (EMA)providing dynamic tongue movement data in 2D. Ultrasound dataare used to estimate target configurations for a complex tonguemodel for a number of sustained articulations. Simultaneousoptical and electromagnetic measurements are performed and thedata are used to resynthesise facial and intra-oralarticulation in the model. A robust resynthesis procedure,capable of animating facial geometries that differ in shapefrom the measured subject, is described. To drive articulation from symbolic (phonetic) input, forexample in the context of a text-to-speech system, bothrule-based and data-driven articulatory control models havebeen developed. The rule-based model effectively handlesforward and backward coarticulation by targetunder-specification, while the data-driven model uses ANNs toestimate articulatory parameter trajectories, trained ontrajectories resynthesised from optical measurements. Thearticulatory control models are evaluated and compared againstother data-driven models trained on the same data. Experimentswith ANNs for driving the articulation of a talking headdirectly from acoustic speech input are also reported. A flexible strategy for generation of non-verbal facialgestures is presented. It is based on a gesture libraryorganised by communicative function, where each function hasmultiple alternative realisations. The gestures can be used tosignal e.g. turn-taking, back-channelling and prominence whenthe talking head is employed as output channel in a spokendialogue system. A device independent XML-based formalism fornon-verbal and verbal output in multimodal dialogue systems isproposed, and it is described how the output specification isinterpreted in the context of a talking head and converted intofacial animation using the gesture library. Through a series of audiovisual perceptual experiments withnoise-degraded audio, it is demonstrated that the animatedtalking head provides significantly increased intelligibilityover the audio-only case, in some cases not significantly belowthat provided by a natural face. Finally, several projects and applications are presented,where the described talking head technology has beensuccessfully employed. Four different multimodal spokendialogue systems are outlined, and the role of the talkingheads in each of the systems is discussed. A telecommunicationapplication where the talking head functions as an aid forhearing-impaired users is also described, as well as a speechtraining application where talking heads and languagetechnology are used with the purpose of improving speechproduction in profoundly deaf children. / QC 20100506
100

Tongue Talking : Studies in Intraoral Speech Synthesis

Engwall, Olov January 2002 (has links)
QC 20100531

Page generated in 0.0738 seconds