• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 7
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 14
  • 14
  • 14
  • 14
  • 7
  • 4
  • 4
  • 4
  • 3
  • 3
  • 3
  • 3
  • 3
  • 3
  • 2
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
11

Parametarska sinteza ekspresivnog govora / Parametric synthesis of expressive speech

Suzić Siniša 12 July 2019 (has links)
<p>U disertaciji su opisani postupci sinteze ekspresivnog govora<br />korišćenjem parametarskih pristupa. Pokazano je da se korišćenjem<br />dubokih neuronskih mreža dobijaju bolji rezultati nego korišćenjem<br />skrivenix Markovljevih modela. Predložene su tri nove metode za<br />sintezu ekspresivnog govora korišćenjem dubokih neuronskih mreža:<br />metoda kodova stila, metoda dodatne obuke mreže i arhitektura<br />zasnovana na deljenim skrivenim slojevima. Pokazano je da se najbolji<br />rezultati dobijaju korišćenjem metode kodova stila. Takođe je<br />predložana i nova metoda za transplantaciju emocija/stilova<br />bazirana na deljenim skrivenim slojevima. Predložena metoda<br />ocenjena je bolje od referentne metode iz literature.</p> / <p>In this thesis methods for expressive speech synthesis using parametric<br />approaches are presented. It is shown that better results are achived with<br />usage of deep neural networks compared to synthesis based on hidden<br />Markov models. Three new methods for synthesis of expresive speech using<br />deep neural networks are presented: style codes, model re-training and<br />shared hidden layer architecture. It is shown that best results are achived by<br />using style code method. The new method for style transplantation based on<br />shared hidden layer architecture is also proposed. It is shown that this<br />method outperforms referent method from literature.</p>
12

Evaluation of how text-to-speech can be adapted for the specific purpose of being an AI psychologist

Rayat, Pooya, Westergård, Hugo January 2023 (has links)
In this research, our goal was to pinpoint the crucial characteristics that make a voice suitable for an AI psychologist. More importantly, we wanted to explore how Text-To-Speech (TTS) combined with conditional voice controlling, also known as ”prompting”, could be used to incorporate these traits into the voice generation process. This approach allowed us to create synthetic voices that were not just effective, but also tailored to the specific needs of an AI psychologist role. We conducted an exploratory survey to identify key traits such as trustworthiness, safety, sympathy, calmness, and firmness. These traits were then used as prompts in the generation of AI voices using Tortoise, a state-of-the-art text-to-speech system. The generated voices were evaluated through a survey study, resulting in a mean opinion score for different categories corresponding to the prompts. Our findings showed that while the AI-generated voices did not quite match the quality of a real human voice, they were still quite effective in capturing the essence of the prompts and producing the desired voice characteristics. This suggests that prompting within TTS, or the strategic design of prompts, can significantly enhance the effectiveness of AI voices. In addition, we explored the potential impact of AI on the labor market, considering factors such as job displacement and creation, changes in salaries, and the need for reskilling. Our study highlights that AI will have a significant impact on the job market, but the exact nature of this impact remains uncertain. Our findings offer valuable insights into the potential of AI in psychology and highlight the importance of tailoring voice synthesis to specific applications. They lay a solid foundation for future research in this area, fostering continued innovation at the intersection of AI, psychology, and economic viability. / I den här forskningen var vårt mål att lokalisera de avgörande egenskaperna som gör en röst lämplig för en AI-psykolog. Vi ville även utforska hur ”Text-Till-Tal” (TTS) i kombination med villkorlig röststyrning, också kallat prompting, kan användas för att införliva dessa egenskaper i röstgenereringsprocessen. Detta tillvägagångssätt gjorde det möjligt för oss att skapa syntetiska röster som inte bara var effektiva, utan också skräddarsydda för de specifika behoven hos en roll som AI-psykolog. Vi genomförde en utforskande undersökning för att identifiera nyckelegenskaper som pålitlighet, säkerhet, sympati, lugn och fasthet. Dessa egenskaper användes sedan som uppmaningar i genereringen av AI-röster med hjälp av TorToise, ett modern TTS-system. De genererade rösterna utvärderades genom en enkätstudie, vilket resulterade i en genomsnittlig åsiktspoäng för olika kategorier som motsvarar uppmaningarna. Våra resultat visade att även om de AI-genererade rösterna inte riktigt matchade kvaliteten på en riktig mänsklig röst, var de fortfarande ganska effektiva för att fånga kärnan i uppmaningarna och producera de önskade röstegenskaperna. Detta tyder på att TTS kombinerat med prompting, eller den emotionella styrningen av TTS, avsevärt kan förbättra effektiviteten hos AI-röster. Dessutom undersökte vi den potentiella effekten av AI på arbetsmarknaden, med hänsyn till faktorer som förskjutning och skapande av jobb, förändringar i löner och behovet av ny kompetens. Vår studie visar att AI kommer att ha en betydande inverkan på arbetsmarknaden, men den exakta karaktären av denna påverkan är fortfarande osäker. Våra resultat ger värdefulla insikter om potentialen för AI inom psykologi och belyser vikten av att skräddarsy röstsyntes för specifika applikationer. De lägger en solid grund för framtida forskning inom detta område och främjar fortsatt innovation i skärningspunkten mellan AI, psykologi och ekonomisk bärkraft.
13

Method for creating phone duration models using very large, multi-speaker, automatically annotated speech corpus / Garsų trukmių modelių kūrimo metodas, naudojant didelės apimties daugelio kalbėtojų garsyną

Norkevičius, Giedrius 01 February 2011 (has links)
Two heretofore unanalyzed aspects are addressed in this dissertation: 1. Building a model capable of predicting phone duration of Lithuanian. All existing investigations of phone durations of Lithuanian were performed by linguists. Usually these investigations are the kind of exploratory statistics and are limited to a single factor, affecting phone duration, analysis. Phone duration dependencies on contextual factors were estimated and written in explicit form (decision tree) in this work by means of machine learning method. 2. Construction of language independent method for creating phone duration models using very large, multi-speaker, automatically annotated speech corpus. Most of the researchers worldwide use speech corpus that are: relatively small scale, single speaker, manually annotated or at least validated by experts. Usually the referred reasons are: using multi-speaker speech corpora is inappropriate because different speakers have different pronunciation manners and speak in different speech rate; automatically annotated corpuses lack accuracy. The created method for phone duration modeling enables the use of such corpus. The main components of the created method are: the reduction of noisy data in speech corpus; normalization of speaker specific phone durations by using phone type clustering. The performed listening tests of synthesized speech, showed that: the perceived naturalness is affected by the underlying phones durations; The use of contextual... [to full text] / Disertacijoje nagrinėjamos dvi iki šiol netyrinėtos problemos: 1. Lietuvių kalbos garsų trukmių prognozavimo modelių kūrimas Iki šiol visi darbai, kuriuose yra nagrinėjamos lietuvių kalbos garsų trukmės, yra atlikti kalbininkų, tačiau šie tyrimai yra daugiau aprašomosios statistikos pobūdžio ir apsiriboja pavienių požymių įtakos garso trukmei analize. Šiame darbe, mašininio mokymo algoritmo pagalba, požymių įtaka garsų trukmei yra išmokstama iš duomenų ir užrašoma sprendimo medžio pavidalu. 2. Nuo kalbos nepriklausomų garsų trukmių prognozavimo modelių kūrimo metodas, naudojant didelės apimties daugelio, kalbėtojų automatiškai, anotuotą garsyną. Dėl skirtingų kalbėtojų tarties specifikos ir dėl automatinio anotavimo netikslumų, kuriant garsų trukmės modelius visame pasaulyje yra apsiribojama vieno kalbėtojo ekspertų anotuotais nedidelės apimties garsynais. Darbe pasiūlyti skirtingų kalbėtojų tarties ypatybių normalizavimo ir garsyno duomenų triukšmo atmetimo algoritmai leidžia garsų trukmių modelių kūrimui naudoti didelės apimties, daugelio kalbėtojų automatiškai anotuotus garsynus. Darbo metu atliktas audicinis tyrimas, kurio pagalba parodoma, kad šnekos signalą sudarančių garsų trukmės turi įtakos klausytojų/respondentų suvokiamam šnekos signalo natūralumui; kontekstinės informacijos panaudojimas garsų trukmių prognozavimo uždavinio sprendime yra svarbus faktorius įtakojantis sintezuotos šnekos natūralumą; natūralaus šnekos signalo atžvilgiu, geriausiai vertinamas yra... [toliau žr. visą tekstą]
14

A Research Bed For Unit Selection Based Text To Speech Synthesis System

Konakanchi, Parthasarathy 02 1900 (has links) (PDF)
After trying Festival Speech Synthesis System, we decided to develop our own TTS framework, conducive to perform the necessary research experiments for developing good quality TTS for Indian languages. In most of the attempts on Indian language TTS, there is no prosody model, provision for handling foreign language words and no phrase break prediction leading to the possibility of introducing appropriate pauses in the synthesized speech. Further, in the Indian context, there is a real felt need for a bilingual TTS, involving English, along with the Indian language. In fact, it may be desirable to also have a trilingual TTS, which can also take care of the language of the neighboring state or Hindi, in addition. Thus, there is a felt need for a full-fledged TTS development framework, which lends itself for experimentation involving all the above issues and more. This thesis work is therefore such a serious attempt to develop a modular, unit selection based TTS framework. The developed system has been tested for its effectiveness to create intelligible speech in Tamil and Kannada. The created system has also been used to carry out two research experiments on TTS. The first part of the work is the design and development of corpus-based concatenative Tamil speech synthesizer in Matlab and C. A synthesis database has been created with 1027 phonetically rich, pre-recorded sentences, segmented at the phone level. From the sentence to be synthesized, specifications of the required target units are predicted. During synthesis, database units are selected that best match the target specification according to a distance metric and a concatenation quality metric. To accelerate matching, the features of the end frames of the database units have been precomputed and stored. The selected units are concatenated to produce synthetic speech. The high values of the obtained mean opinion scores for the TTS output reveal that speech synthesized using our TTS is intelligible and acceptably natural and can possibly be put to commercial use with some additional features. Experiments carried out by others using my TTS framework have shown that, whenever the required phonetic context is not available in the synthesis database., similar phones that are perceptually indistinguishable may be substituted. The second part of the work deals with the design and modification of the developed TTS framework to be embedded in mobile phones. Commercial GSM FR, EFR and AMR speech codecs are used for compressing our synthesis database. Perception experiments reveal that speech synthesized using a highly compressed database is reasonably natural. This holds promise in the future to read SMSs and emails on mobile phones in Indian languages. Finally, we observe that incorporating prosody and pause models for Indian language TTS would further enhance the quality of the synthetic speech. These are some of the potential, unexplored areas ahead, for research in speech synthesis in Indian languages.

Page generated in 0.0608 seconds