Global ETD Search

101	Läsplatta som hjälpmedel för dyslektiker / E-reader as an assistive tool for people with dyslexia Lange, Josefin, Kodzaga, Amna January 2012 (has links) Denna uppsats tar upp frågan om en teknisk artefakt, i detta fall läs- och/eller surfplatta, kan ses som en bra lösning för dyslektiker i deras svårigheter med läsningen. Studien har fokus på läs- och surfplattans funktioner som kan vara en möjlighet att hjälpa personer med läsning. Uppsatsens experiment beprövar en redan etablerad metod, läsa och lyssna samtidigt, genom att sätta samman ett antal funktioner och tekniker som finns idag i en enhet. Vi har undersökt om man kan öka läshastigheten och läsförståelsen hos dyslektiker med hjälp av sådan teknik på en läs/surfplatta. Resultatet visar en trend på att metoden fungerar. Trenden är dock bara svag enligt analysen av observationsdata då undersökningen endast utförts på tio personer. Det innebär att resultatet endast visar att undersökningen är rätt utformad men att den bör utföras i en större utsträckning för att den skall kunna representera populationen dyslektiker. / This paper investigates if a technological artifact, in this case eReader and / or tablet computer, can assist dyslexics with the reading. The study focuses on reading and the tablet computers’ functions that may help people with reading. The essay will be testing an already established method, reading and listening simultaneously, by combining a number of features and technologies that exist today into one entity. We have tested experimentally whether one can increase reading speed and the comprehension on those with dyslexia. The results indicate a trend in that the method works. However only a weak trend was observed since only ten subjects could be tested. This means that the results show that the study is properly designed, but it is suggested that the study should be conducted with a great number of test to be able to represent the population of dyslexics. Dyslexia e-reader e-book speech synthesis assistive tool DAISY Dyslexi läsplatta e-bok talsyntes hjälpmedel DAISY
102	A Hidden Markov Model-Based Approach for Emotional Speech Synthesis Yang, Chih-Yung 30 August 2010 (has links) In this thesis, we describe two approaches to automatically synthesize the emotional speech of a target speaker based on the hidden Markov model for his/her neutral speech. In the interpolation based method, the basic idea is the model interpolation between the neutral model of the target speaker and an emotional model selected from a candidate pool. Both the interpolation model selection and the interpolation weight computation are determined based on a model-distance measure. We propose a monophone-based Mahalanobis distance (MBMD). In the parallel model combination (PMC) based method, our basic idea is to model the mismatch between neutral model and emotional model. We train linear regression model to describe this mismatch. And then we combine the target speaker neutral model with the linear regression model. We evaluate our approach on the synthesized emotional speech of angriness, happiness, and sadness with several subjective tests. Experimental results show that the implemented system is able to synthesize speech with emotional expressiveness of the target speaker. speech synthesis HMM emotional expressiveness model combination linear regression model interpolation Mahalanobis distance
103	Using Latin Square Design To Evaluate Model Interpolation And Adaptation Based Emotional Speech Synthesis Hsu, Chih-Yu 19 July 2012 (has links) ¡@¡@In this thesis, we use a hidden Markov model which can use a small amount of corpus to synthesize speech with certain quality to implement speech synthesis system for Chinese. More, the emotional speech are synthesized by the flexibility of the parametric speech in this model. We conduct model interpolation and model adaptation to synthesize speech from neutral to particular emotion without target speaker¡¦s emotional speech. In model adaptation, we use monophone-based Mahalanobis distance to select emotional models which are close to target speaker from pool of speakers, and estimate the interpolation weight to synthesize emotional speech. In model adaptation, we collect abundant of data training average voice models for each individual emotion. These models are adapted to specific emotional models of target speaker by CMLLR method. In addition, we design the Latin-square evaluation to reduce the systematic offset in the subjective tests, making results more credible and fair. We synthesize emotional speech include happiness, anger, sadness, and use Latin square design to evaluate performance in three part similarity, naturalness, and emotional expression respectively. According to result, we make a comprehensive comparison and conclusions of two method in emotional speech synthesis. model interpolation Latin-square design hidden Markov model model adaptation emotional speech synthesis Mahalanobis distance
104	Lietuvių kalbos priebalsių spketro analizė / Lithuanian language consonats spectrum analysis Šimkus, Ramūnas, Stumbras, Tomas 03 September 2010 (has links) 20 amžiaus antrojoje pusėje ypač suaktyvėjo tyrimai kalbančiojo atpažinimo ir kalbos sintezavimo srityje. Jau nuo penktojo dešimtmečio vykdomi tyrimai siekiant sukurti sistemas galinčias atpažinti šnekamąją kalbą. Ypač svarbu šioje srityje yra kokybiškai atskirti kalbos signalus. Aštuntajame dešimtmetyje buvo sukurta eilė požymių išskyrimo metodų. Svarbesni iš jų yra melų skalės kepstras, suvokimu paremta tiesinė prognozė (perceptual linear prediction), delta kepstras ir kiti.[3] Naudojant šiuolaikinę kompiuterinę įrangą, signalų atskyrimo uždavinys gerokai supaprastėja, tačiau vis tiek išlieka labai sudėtingas. Kalbos sintezatorius yra kompiuterinė sistema, kuri gali atpažinti žmogaus balsą bet kokiame tekste. Sistema gali automatiškai sugeneruoti žmogaus balsą. Viena iš perspektyviausių balso technologijų panaudojimo sričių – įvairūs neįgaliems žmonėms skirti taikymai (akliems ir silpnaregiams, nevaikščiojantiems arba turintiems ribotas judėjimo galimybes). Balso technologijų panaudojimas dažnai yra esminis arba net vienintelis tokių žmonių integravimo į visuomenę būdas. Dar yra daugybė tokių sistemų panaudojimo sričių: • telefoninių ryšių centrai, automatiškai aptarnaujantys telefoninius pokalbius, atpažįstantys ir suprantantys, ką skambinantis sako; • automatinės transporto tvarkaraščių užklausimo sistemos; • automobilio mazgų valdymo žmogaus balsu priemonės; • nenutrūkstamos kalbos atpažinimo sistemos darbui teksto redaktoriais; Kalbos signalams analizuoti bei atskirti... [toliau žr. visą tekstą] / In 20th century speech recognition and synthesis became very important part of science. In last 50 years were a lot of researches in speech recognition. And for the moment there are many systems for speech recognition and synthesis for popular European languages, such as French, English, Germanic languages. One of the most important benefits of this is for disabled people to make their life more comfortable and adopt them to normal life, to create new interfaces and possibility to use personal computers for them. For Lithuanian language need researches, because of our language unique. An aim of research is a spectrum of Lithuanian consonants. Main method is linear prediction is used for finding formants. There are some main methods for speech signals analysis: linear prediction, Furier transformation, cepstral analysis. For linear prediction are several different algorithms. We used Burg algorithm for finding formants. In this research paper records of words were annotated and analyzed by PRAAT software. Formant movement obtained with same program. Obtained data of research was processed with MATLAB 6.5 software. All consonants were divided to groups, such as voiced and unvoiced, semivowels, plosives and fricatives. In our research was analyzed influence of vowels following after consonant. Obtained data is useful for increasing quality in speech recognition and synthesis. Paper includes: 1. Speech generation analysis. 2. Spectrum analysis methods. 3. Experiment methodology... [to full text] Electronics and Electrical Engineering Formnatės Spektras Kalbos sintezė Kalbos atpažinimas Formants Spectrum Speech synthesis Speech recognition
105	Balso atpažinimo programų lietuvinimo galimybių tyrimas / Speech recognition program`s Lithuanization possibility survey Bivainis, Robertas 30 September 2013 (has links) Šiame darbe yra analizuojama ir tiriama kaip veikia balso atpažinimo sistema HTK, kokie žingsniai turi būti atlikti norint sėkmingai atpažinti lietuviškai išartus žodžius. Taip pat apžvelgiamos kokių kalbos technologijų samprata reikalinga norint sukurti balso atpažinimo programą. Balso atpažinime labai svarbu yra kalbos signalų atpažinimo modeliai ir paslėptosios Markovo grandinės, todėl analizėje yra apžvelgiama jų veikimo principai ir algoritmai. / This thesis will focus on how the speech recognition program HTK operates and what steps have to be taken in order to recognize spoken Lithuanian words. Also the emphasis of this thesis goes to conceptions of speech recognition technologies which are needed to create a speech recognition program. Informatics Balso atpažinimas Kalbos atpažinimas Kalbos sintezė Voice recognition Speech recognition Speech synthesis
106	Advanced natural language processing for improved prosody in text-to-speech synthesis / G. I. Schlünz Schlünz, Georg Isaac January 2014 (has links) Text-to-speech synthesis enables the speech-impeded user of an augmentative and alternative communication system to partake in any conversation on any topic, because it can produce dynamic content. Current synthetic voices do not sound very natural, however, lacking in the areas of emphasis and emotion. These qualities are furthermore important to convey meaning and intent beyond that which can be achieved by the vocabulary of words only. Put differently, speech synthesis requires a more comprehensive analysis of its text input beyond the word level to infer the meaning and intent that elicit emphasis and emotion. The synthesised speech then needs to imitate the effects that these textual factors have on the acoustics of human speech. This research addresses these challenges by commencing with a literature study on the state of the art in the fields of natural language processing, text-to-speech synthesis and speech prosody. It is noted that the higher linguistic levels of discourse, information structure and affect are necessary for the text analysis to shape the prosody appropriately for more natural synthesised speech. Discourse and information structure account for meaning, intent and emphasis, and affect formalises the modelling of emotion. The OCC model is shown to be a suitable point of departure for a new model of affect that can leverage the higher linguistic levels. The audiobook is presented as a text and speech resource for the modelling of discourse, information structure and affect because its narrative structure is prosodically richer than the random constitution of a traditional text-to-speech corpus. A set of audiobooks are selected and phonetically aligned for subsequent investigation. The new model of discourse, information structure and affect, called e-motif, is developed to take advantage of the audiobook text. It is a subjective model that does not specify any particular belief system in order to appraise its emotions, but defines only anonymous affect states. Its cognitive and social features rely heavily on the coreference resolution of the text, but this process is found not to be accurate enough to produce usable features values. The research concludes with an experimental investigation of the influence of the e-motif features on human speech and synthesised speech. The aligned audiobook speech is inspected for prosodic correlates of the cognitive and social features, revealing that some activity occurs in the into national domain. However, when the aligned audiobook speech is used in the training of a synthetic voice, the e-motif effects are overshadowed by those of structural features that come standard in the voice building framework. / PhD (Information Technology), North-West University, Vaal Triangle Campus, 2014 Natural language processing Text-to-speech synthesis Prosody Discourse Information structure Affect OCC model E-motif
107	The effects of part–of–speech tagging on text–to–speech synthesis for resource–scarce languages / G.I. Schlünz Schlünz, Georg Isaac January 2010 (has links) In the world of human language technology, resource–scarce languages (RSLs) suffer from the problem of little available electronic data and linguistic expertise. The Lwazi project in South Africa is a large–scale endeavour to collect and apply such resources for all eleven of the official South African languages. One of the deliverables of the project is more natural text–to–speech (TTS) voices. Naturalness is primarily determined by prosody and it is shown that many aspects of prosodic modelling is, in turn, dependent on part–of–speech (POS) information. Solving the POS problem is, therefore, a prudent first step towards meeting the goal of natural TTS voices. In a resource–scarce environment, obtaining and applying the POS information are not trivial. Firstly, an automatic tagger is required to tag the text to be synthesised with POS categories, but state–of–the–art POS taggers are data–driven and thus require large amounts of labelled training data. Secondly, the subsequent processes in TTS that are used to apply the POS information towards prosodic modelling are resource–intensive themselves: some require non–trivial linguistic knowledge; others require labelled data as well. The first problem asks the question of which available POS tagging algorithm will be the most accurate on little training data. This research sets out to answer the question by reviewing the most popular supervised data–driven algorithms. Since literature to date consists mostly of isolated papers discussing one algorithm, the aim of the review is to consolidate the research into a single point of reference. A subsequent experimental investigation compares the tagging algorithms on small training data sets of English and Afrikaans, and it is shown that the hidden Markov model (HMM) tagger outperforms the rest when using both a comprehensive and a reduced POS tagset. Regarding the second problem, the question arises whether it is perhaps possible to circumvent the traditional approaches to prosodic modelling by learning the latter directly from the speech data using POS information. In other words, does the addition of POS features to the HTS context labels improve the naturalness of a TTS voice? Towards answering this question, HTS voices are trained from English and Afrikaans prosodically rich speech. The voices are compared with and without POS features incorporated into the HTS context labels, analytically and perceptually. For the analytical experiments, measures of prosody to quantify the comparisons are explored. It is then also noted whether the results of the perceptual experiments correlate with their analytical counterparts. It is found that, when a minimal feature set is used for the HTS context labels, the addition of POS tags does improve the naturalness of the voice. However, the same effect can be accomplished by including segmental counting and positional information instead of the POS tags. / Thesis (M.Sc. Engineering Sciences (Electrical and Electronic Engineering))--North-West University, Potchefstroom Campus, 2011. part-of-speech tagging text-to-speech synthesis resource-scarce language naturalness prosody HTS context labels
108	The effects of part–of–speech tagging on text–to–speech synthesis for resource–scarce languages / G.I. Schlünz Schlünz, Georg Isaac January 2010 (has links) In the world of human language technology, resource–scarce languages (RSLs) suffer from the problem of little available electronic data and linguistic expertise. The Lwazi project in South Africa is a large–scale endeavour to collect and apply such resources for all eleven of the official South African languages. One of the deliverables of the project is more natural text–to–speech (TTS) voices. Naturalness is primarily determined by prosody and it is shown that many aspects of prosodic modelling is, in turn, dependent on part–of–speech (POS) information. Solving the POS problem is, therefore, a prudent first step towards meeting the goal of natural TTS voices. In a resource–scarce environment, obtaining and applying the POS information are not trivial. Firstly, an automatic tagger is required to tag the text to be synthesised with POS categories, but state–of–the–art POS taggers are data–driven and thus require large amounts of labelled training data. Secondly, the subsequent processes in TTS that are used to apply the POS information towards prosodic modelling are resource–intensive themselves: some require non–trivial linguistic knowledge; others require labelled data as well. The first problem asks the question of which available POS tagging algorithm will be the most accurate on little training data. This research sets out to answer the question by reviewing the most popular supervised data–driven algorithms. Since literature to date consists mostly of isolated papers discussing one algorithm, the aim of the review is to consolidate the research into a single point of reference. A subsequent experimental investigation compares the tagging algorithms on small training data sets of English and Afrikaans, and it is shown that the hidden Markov model (HMM) tagger outperforms the rest when using both a comprehensive and a reduced POS tagset. Regarding the second problem, the question arises whether it is perhaps possible to circumvent the traditional approaches to prosodic modelling by learning the latter directly from the speech data using POS information. In other words, does the addition of POS features to the HTS context labels improve the naturalness of a TTS voice? Towards answering this question, HTS voices are trained from English and Afrikaans prosodically rich speech. The voices are compared with and without POS features incorporated into the HTS context labels, analytically and perceptually. For the analytical experiments, measures of prosody to quantify the comparisons are explored. It is then also noted whether the results of the perceptual experiments correlate with their analytical counterparts. It is found that, when a minimal feature set is used for the HTS context labels, the addition of POS tags does improve the naturalness of the voice. However, the same effect can be accomplished by including segmental counting and positional information instead of the POS tags. / Thesis (M.Sc. Engineering Sciences (Electrical and Electronic Engineering))--North-West University, Potchefstroom Campus, 2011. part-of-speech tagging text-to-speech synthesis resource-scarce language naturalness prosody HTS context labels
109	More than words text-to-speech technology as a matter of self-efficacy, self-advocacy, and choice / Parr, Michelann. January 1900 (has links) Thesis (Ph.D.). / Written for the Dept. of Integrated Studies in Education. Title from title page of PDF (viewed 2009/03/30). Includes bibliographical references.
110	Understanding and improving the identification of concurrently presented earcons McGookin, David Kerr. January 2004 (has links) Thesis (Ph.D.) - University of Glasgow, 2004. / Includes bibliographical references. Print version also available.

Search results