• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 6
  • 4
  • 1
  • Tagged with
  • 12
  • 12
  • 5
  • 5
  • 4
  • 4
  • 4
  • 3
  • 3
  • 3
  • 3
  • 3
  • 3
  • 3
  • 2
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
11

Discussion On Effective Restoration Of Oral Speech Using Voice Conversion Techniques Based On Gaussian Mixture Modeling

Alverio, Gustavo 01 January 2007 (has links)
Today's world consists of many ways to communicate information. One of the most effective ways to communicate is through the use of speech. Unfortunately many lose the ability to converse. This in turn leads to a large negative psychological impact. In addition, skills such as lecturing and singing must now be restored via other methods. The usage of text-to-speech synthesis has been a popular resolution of restoring the capability to use oral speech. Text to speech synthesizers convert text into speech. Although text to speech systems are useful, they only allow for few default voice selections that do not represent that of the user. In order to achieve total restoration, voice conversion must be introduced. Voice conversion is a method that adjusts a source voice to sound like a target voice. Voice conversion consists of a training and converting process. The training process is conducted by composing a speech corpus to be spoken by both source and target voice. The speech corpus should encompass a variety of speech sounds. Once training is finished, the conversion function is employed to transform the source voice into the target voice. Effectively, voice conversion allows for a speaker to sound like any other person. Therefore, voice conversion can be applied to alter the voice output of a text to speech system to produce the target voice. The thesis investigates how one approach, specifically the usage of voice conversion using Gaussian mixture modeling, can be applied to alter the voice output of a text to speech synthesis system. Researchers found that acceptable results can be obtained from using these methods. Although voice conversion and text to speech synthesis are effective in restoring voice, a sample of the speaker before voice loss must be used during the training process. Therefore it is vital that voice samples are made to combat voice loss.
12

LaMOSNet: Latent Mean-Opinion-Score Network for Non-intrusive Speech Quality Assessment : Deep Neural Network for MOS Prediction / LaMOSNet: Latent Mean-Opinion-Score Network för icke-intrusiv ljudkvalitetsbedömning : Djupt neuralt nätverk för MOS prediktion

Cumlin, Fredrik January 2022 (has links)
Objective non-intrusive speech quality assessment aimed to emulate and correlate with human judgement has received more attention over the years. It is a difficult problem due to three reasons: data scarcity, noisy human judgement, and a potential uneven distribution of bias of mean opinion scores (MOS). In this paper, we introduce the Latent Mean-Opinion-Score Network (LaMOSNet) that leverage on individual judge’s scores to increase the data size, and new ideas to deal with both noisy and biased labels. We introduce a methodology called Optimistic Judge Estimation as a way to reduce bias in MOS in a clear way. We also implement stochastic gradient noise and mean teacher, ideas from noisy image classification, to further deal with noisy and uneven bias distribution of labels. We achieve competitive results on VCC2018 modeling MOS, and state-of-the-art modeling only listener dependent scores. / Objektiv referensfri ljudkvalitétsbedömning ämnad att härma och korrelera med mänsklig bedömning har fått mer uppmärksamhet med åren. Det är ett svårt problem på grund av tre anledningar: brist på data, varians i mänsklig bedömning, och en potentiell ojämn fördelning av bias av medel bedömningsvärde (mean opinion score, MOS). I detta papper introducerar vi Latent Mean-Opinion-Score Network (LaMOSNet) som tar nytta av individuella bedömmares poäng för att öka datastorleken, och nya idéer för att handskas med både varierande och partisk märkning. Jag introducerar en metodologi som kallas Optimistisk bedömmarestimering, ett sätt att minska partiskheten i MOS på ett klart sätt. Jag implementerar också stokastisk gradient variation och medellärare, idéer från opålitlig bild igenkänning, för att ännu mer hantera opålitliga märkningar. Jag får jämförelsebara resultat på VCC2018 när jag modellerar MOS, och state-of-the-art när jag modellerar enbart beömmarnas märkning.

Page generated in 0.0954 seconds