• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 88
  • 16
  • 6
  • 5
  • 4
  • 4
  • 4
  • 4
  • 4
  • 4
  • 4
  • 3
  • 3
  • 2
  • 2
  • Tagged with
  • 181
  • 181
  • 61
  • 38
  • 38
  • 35
  • 33
  • 33
  • 20
  • 19
  • 18
  • 17
  • 14
  • 14
  • 13
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
71

Enkelsybanddemodulasie met behulp van syferseinverwerking

Kruger, Johannes Petrus 12 June 2014 (has links)
M.Ing. (Electrical and Electronic Engineering) / The feasibility of modulation and demodulation of speech signals within a microprocessor is invertigated in the following study. Existing modulation and demodulation techniques are investigated and new techniques. suitable for microprocessor implementation, described. Finally a single sideband demodulator was built using the TMS32010 microprocessor with results being better or comparable than existing analog techniques.
72

The development of an automatic pronunciation assistant

Sefara, Tshephisho Joseph January 2019 (has links)
Thesis (M. Sc. (Computer Science)) -- University of Limpopo, 2019 / The pronunciation of words and phrases in any language involves careful manipulation of linguistic features. Factors such as age, motivation, accent, phonetics, stress and intonation sometimes cause a problem of inappropriate or incorrect pronunciation of words from non-native languages. Pronunciation of words using different phonological rules has a tendency of changing the meaning of those words. This study presents the development of an automatic pronunciation assistant system for under-resourced languages of Limpopo Province, namely, Sepedi, Xitsonga, Tshivenda and isiNdebele. The aim of the proposed system is to help non-native speakers to learn appropriate and correct pronunciation of words/phrases in these under-resourced languages. The system is composed of a language identification module on the front-end side and a speech synthesis module on the back-end side. A support vector machine was compared to the baseline multinomial naive Bayes to build the language identification module. The language identification phase performs supervised multiclass text classification to predict a person’s first language based on input text before the speech synthesis phase continues with pronunciation issues using the identified language. The speech synthesis on the back-end phase is composed of four baseline text-to-speech synthesis systems in selected target languages. These text-to-speech synthesis systems were based on the hidden Markov model method of development. Subjective listening tests were conducted to evaluate the performance of the quality of the synthesised speech using a mean opinion score test. The mean opinion score test obtained good performance results on all targeted languages for naturalness, pronunciation, pleasantness, understandability, intelligibility, overall quality of the system and user acceptance. The developed system has been implemented on a “real-live” production web-server for performance evaluation and stability testing using live data.
73

A rule-based system to automatically segment and label continuous speech of known text /

Boissonneault, Paul G. January 1984 (has links)
No description available.
74

La synthèse par ordinateur du français montréalais /

Bernardi, Dave A. W. January 1985 (has links)
No description available.
75

Format-based synthesis of Chinese speech

Wang, Min January 1986 (has links)
No description available.
76

MULTI-PLATFORM IMPLEMENTATION OF SPEECH APIS

Manukyan, Karen 21 August 2008 (has links)
No description available.
77

Improving High Quality Concatenative Text-to-Speech Using the Circular Linear Prediction Model

Shukla, Sunil Ravindra 10 January 2007 (has links)
Current high quality text-to-speech (TTS) systems are based on unit selection from a large database that is both contextually and prosodically rich. These systems, albeit capable of natural voice quality, are computationally expensive and require a very large footprint. Their success is attributed to the dramatic reduction of storage costs in recent times. However, for many TTS applications a smaller footprint is becoming a standard requirement. This thesis presents a new method for representing speech segments that can improve the quality and/or reduce the footprint current concatenative TTS systems. The circular linear prediction (CLP) model is revisited and combined with the constant pitch transform (CPT) to provide a robust representation of speech signals that allows for limited prosodic movements without a perceivable loss in quality. The CLP model assumes that each frame of voiced speech is an infinitely periodic signal. This assumption allows for LPC modeling using the covariance method, with the efficiency of the autocorrelation method. The CPT is combined with this model to provide a database that is uniform in pitch for matching the target prosody during synthesis. With this representation, limited prosody modifications and unit concatenation can be performed without causing audible artifacts. For resolving artifacts caused by pitch modifications in voicing transitions, a method has been introduced for reducing peakiness in the LP spectra by constraining the line spectral frequencies. Two experiments have been conducted to demonstrate the potential for the capabilities of CLP/CPT method. The first is a listening test to determine the ability of this model to realize prosody modifications without perceivable degradation. Utterances are resynthesized using the CLP/CPT method with emphasized prosodics to increase intelligibility in harsh environments. The second experiment compares the quality of utterances synthesized by unit-selection based limited-domain TTS against the CLP/CPT method. The results demonstrate that the CLP/CPT representation, applied to current concatenative TTS systems, can reduce the size of the database and increase the prosodic richness without noticeable degradation in voice quality.
78

Visual speech synthesis by learning joint probabilistic models of audio and video

Deena, Salil Prashant January 2012 (has links)
Visual speech synthesis deals with synthesising facial animation from an audio representation of speech. In the last decade or so, data-driven approaches have gained prominence with the development of Machine Learning techniques that can learn an audio-visual mapping. Many of these Machine Learning approaches learn a generative model of speech production using the framework of probabilistic graphical models, through which efficient inference algorithms can be developed for synthesis. In this work, the audio and visual parameters are assumed to be generated from an underlying latent space that captures the shared information between the two modalities. These latent points evolve through time according to a dynamical mapping and there are mappings from the latent points to the audio and visual spaces respectively. The mappings are modelled using Gaussian processes, which are non-parametric models that can represent a distribution over non-linear functions. The result is a non-linear state-space model. It turns out that the state-space model is not a very accurate generative model of speech production because it assumes a single dynamical model, whereas it is well known that speech involves multiple dynamics (for e.g. different syllables) that are generally non-linear. In order to cater for this, the state-space model can be augmented with switching states to represent the multiple dynamics, thus giving a switching state-space model. A key problem is how to infer the switching states so as to model the multiple non-linear dynamics of speech, which we address by learning a variable-order Markov model on a discrete representation of audio speech. Various synthesis methods for predicting visual from audio speech are proposed for both the state-space and switching state-space models. Quantitative evaluation, involving the use of error and correlation metrics between ground truth and synthetic features, is used to evaluate our proposed method in comparison to other probabilistic models previously applied to the problem. Furthermore, qualitative evaluation with human participants has been conducted to evaluate the realism, perceptual characteristics and intelligibility of the synthesised animations. The results are encouraging and demonstrate that by having a joint probabilistic model of audio and visual speech that caters for the non-linearities in audio-visual mapping, realistic visual speech can be synthesised from audio speech.
79

Evaluation of how text-to-speech can be adapted for the specific purpose of being an AI psychologist

Rayat, Pooya, Westergård, Hugo January 2023 (has links)
In this research, our goal was to pinpoint the crucial characteristics that make a voice suitable for an AI psychologist. More importantly, we wanted to explore how Text-To-Speech (TTS) combined with conditional voice controlling, also known as ”prompting”, could be used to incorporate these traits into the voice generation process. This approach allowed us to create synthetic voices that were not just effective, but also tailored to the specific needs of an AI psychologist role. We conducted an exploratory survey to identify key traits such as trustworthiness, safety, sympathy, calmness, and firmness. These traits were then used as prompts in the generation of AI voices using Tortoise, a state-of-the-art text-to-speech system. The generated voices were evaluated through a survey study, resulting in a mean opinion score for different categories corresponding to the prompts. Our findings showed that while the AI-generated voices did not quite match the quality of a real human voice, they were still quite effective in capturing the essence of the prompts and producing the desired voice characteristics. This suggests that prompting within TTS, or the strategic design of prompts, can significantly enhance the effectiveness of AI voices. In addition, we explored the potential impact of AI on the labor market, considering factors such as job displacement and creation, changes in salaries, and the need for reskilling. Our study highlights that AI will have a significant impact on the job market, but the exact nature of this impact remains uncertain. Our findings offer valuable insights into the potential of AI in psychology and highlight the importance of tailoring voice synthesis to specific applications. They lay a solid foundation for future research in this area, fostering continued innovation at the intersection of AI, psychology, and economic viability. / I den här forskningen var vårt mål att lokalisera de avgörande egenskaperna som gör en röst lämplig för en AI-psykolog. Vi ville även utforska hur ”Text-Till-Tal” (TTS) i kombination med villkorlig röststyrning, också kallat prompting, kan användas för att införliva dessa egenskaper i röstgenereringsprocessen. Detta tillvägagångssätt gjorde det möjligt för oss att skapa syntetiska röster som inte bara var effektiva, utan också skräddarsydda för de specifika behoven hos en roll som AI-psykolog. Vi genomförde en utforskande undersökning för att identifiera nyckelegenskaper som pålitlighet, säkerhet, sympati, lugn och fasthet. Dessa egenskaper användes sedan som uppmaningar i genereringen av AI-röster med hjälp av TorToise, ett modern TTS-system. De genererade rösterna utvärderades genom en enkätstudie, vilket resulterade i en genomsnittlig åsiktspoäng för olika kategorier som motsvarar uppmaningarna. Våra resultat visade att även om de AI-genererade rösterna inte riktigt matchade kvaliteten på en riktig mänsklig röst, var de fortfarande ganska effektiva för att fånga kärnan i uppmaningarna och producera de önskade röstegenskaperna. Detta tyder på att TTS kombinerat med prompting, eller den emotionella styrningen av TTS, avsevärt kan förbättra effektiviteten hos AI-röster. Dessutom undersökte vi den potentiella effekten av AI på arbetsmarknaden, med hänsyn till faktorer som förskjutning och skapande av jobb, förändringar i löner och behovet av ny kompetens. Vår studie visar att AI kommer att ha en betydande inverkan på arbetsmarknaden, men den exakta karaktären av denna påverkan är fortfarande osäker. Våra resultat ger värdefulla insikter om potentialen för AI inom psykologi och belyser vikten av att skräddarsy röstsyntes för specifika applikationer. De lägger en solid grund för framtida forskning inom detta område och främjar fortsatt innovation i skärningspunkten mellan AI, psykologi och ekonomisk bärkraft.
80

Quality differences in male and female vocoded speech.

Christopher, Deborah Kaye. January 1978 (has links)
Thesis: M.S., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 1978 / Includes bibliographical references. / M.S. / M.S. Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science

Page generated in 0.141 seconds