Global ETD Search

1	Conversão de voz baseada na transformada wavelet / Conversão de voz baseada na transformada wavelet Vieira, Lucimar Sasso 16 April 2007 (has links) Dentre as inúmeras técnicas de conversão de voz utilizadas atualmente, aquelas baseadas em bancos de filtros wavelet, associadas com redes neurais artificiais,têm se destacado. Este trabalho se concentra em tais técnicas, realizando um estudo que relaciona qual a melhor wavelet para conversão de determinados padrões de voz, apresentando uma análise detalhada de quais são as características que levam a estes resultados. Os testes são realizados com vozes da base de dados TIMIT do Linguistic Data Consortium (LDC). / Dentre as inúmeras técnicas de conversão de voz utilizadas atualmente, aquelas baseadas em bancos de filtros wavelet, associadas com redes neurais artificiais, têm se destacado. Este trabalho se concentra em tais técnicas, realizando um estudo que relaciona qual a melhor wavelet para conversão de determinados padrões de voz, apresentando uma análise detalhada de quais são as características que levam a estes resultados. Os testes são realizados com vozes da base de dados TIMIT do Linguistic Data Consortium (LDC). Neural networks Voice conversion Wavelets
2	Conversão de voz baseada na transformada wavelet / Conversão de voz baseada na transformada wavelet Lucimar Sasso Vieira 16 April 2007 (has links) Dentre as inúmeras técnicas de conversão de voz utilizadas atualmente, aquelas baseadas em bancos de filtros wavelet, associadas com redes neurais artificiais,têm se destacado. Este trabalho se concentra em tais técnicas, realizando um estudo que relaciona qual a melhor wavelet para conversão de determinados padrões de voz, apresentando uma análise detalhada de quais são as características que levam a estes resultados. Os testes são realizados com vozes da base de dados TIMIT do Linguistic Data Consortium (LDC). / Dentre as inúmeras técnicas de conversão de voz utilizadas atualmente, aquelas baseadas em bancos de filtros wavelet, associadas com redes neurais artificiais, têm se destacado. Este trabalho se concentra em tais técnicas, realizando um estudo que relaciona qual a melhor wavelet para conversão de determinados padrões de voz, apresentando uma análise detalhada de quais são as características que levam a estes resultados. Os testes são realizados com vozes da base de dados TIMIT do Linguistic Data Consortium (LDC). Neural networks Voice conversion Wavelets
3	Articulatory-based Speech Processing Methods for Foreign Accent Conversion Felps, Daniel 2011 August 1900 (has links) The objective of this dissertation is to develop speech processing methods that enable without altering their identity. We envision accent conversion primarily as a tool for pronunciation training, allowing non-native speakers to hear their native-accented selves. With this application in mind, we present two methods of accent conversion. The first assumes that the voice quality/identity of speech resides in the glottal excitation, while the linguistic content is contained in the vocal tract transfer function. Accent conversion is achieved by convolving the glottal excitation of a non-native speaker with the vocal tract transfer function of a native speaker. The result is perceived as 60 percent less accented, but it is no longer identified as the same individual. The second method of accent conversion selects segments of speech from a corpus of non-native speech based on their acoustic or articulatory similarity to segments from a native speaker. We predict that articulatory features provide a more speaker-independent representation of speech and are therefore better gauges of linguistic similarity across speakers. To test this hypothesis, we collected a custom database containing simultaneous recordings of speech and the positions of important articulators (e.g. lips, jaw, tongue) for a native and non-native speaker. Resequencing speech from a non-native speaker based on articulatory similarity with a native speaker achieved a 20 percent reduction in accent. The approach is particularly appealing for applications in pronunciation training because it modifies speech in a way that produces realistically achievable changes in accent (i.e., since the technique uses sounds already produced by the non-native speaker). A second contribution of this dissertation is the development of subjective and objective measures to assess the performance of accent conversion systems. This is a difficult problem because, in most cases, no ground truth exists. Subjective evaluation is further complicated by the interconnected relationship between accent and identity, but modifications of the stimuli (i.e. reverse speech and voice disguises) allow the two components to be separated. Algorithms to measure objectively accent, quality, and identity are shown to correlate well with their subjective counterparts. speech processing voice conversion accent conversion
4	Conversão de voz inter-linguística / Crosslingual Voice Conversion Machado, Anderson Fraiha 21 May 2013 (has links) A conversão de voz é um problema emergente em processamento de fala e voz com um crescente interesse comercial, tanto em aplicações como Tradução Fala para Fala (Speech-to-Speech Translation - SST) e em sistemas Text-To-Speech (TTS) personalizados. Um sistema de Conversão de Voz deve permitir o mapeamento de características acústicas de sentenças pronunciadas por um falante origem para valores correspondentes da voz do falante destino, de modo que a saída processada é percebida como uma sentença pronunciada pelo falante destino. Nas últimas duas décadas, o número de contribuições cientícas relacionadas ao problema de conversão de voz tem crescido consideravelmente, e um panorama sólido do processo histórico, assim como de técnicas propostas são indispensáveis para contribuição neste campo. O objetivo deste trabalho é realizar um levantamento geral das técnicas utilizadas para resolver o problema, apontando vantagens e desvantagens de cada método, e a partir deste estudo, desenvolver novas ferramentas. Dentre as contribuições do trabalho, foram desenvolvidos um método para decomposição espectral em termos de bases radiais, mapas fonéticos articiais, agrupamentos k-verossímeis, funções de empenamento em frequência entre outras, com o intuito de implementar um sistema de conversão de voz inter-linguístico independente de texto de alta qualidade. / Voice conversion is an emergent problem in voice and speech processing with increasing commercial interest, due to applications such as Speech-to-Speech Translation (SST) and personalized Text-To-Speech (TTS) systems. A Voice Conversion system should allow the mapping of acoustical features of sentences pronounced by a source speaker to values corresponding to the voice of a target speaker, in such a way that the processed output is perceived as a sentence uttered by the target speaker. In the last two decades the number of scientic contributions to the voice conversion problem has grown considerably, and a solid overview of the historical process as well as of the proposed techniques is indispensable for those willing to contribute to the eld. The goal of this work is to provide a critical survey that combines historical presentation to technical discussion while pointing out advantages and drawbacks of each technique, and from this study, to develop new tools. Some contributions proposed in this work include a method for spectral decomposition in terms of radial basis functions, articial phonetic map, warping functions among others, in order to implement a text-independent crosslingual voice conversion system of high quality. Conversão de voz Conversão Inter-linguística. Cross-Lingual Voice Conversion. Voice Conversion
5	Conversão de voz inter-linguística / Crosslingual Voice Conversion Anderson Fraiha Machado 21 May 2013 (has links) A conversão de voz é um problema emergente em processamento de fala e voz com um crescente interesse comercial, tanto em aplicações como Tradução Fala para Fala (Speech-to-Speech Translation - SST) e em sistemas Text-To-Speech (TTS) personalizados. Um sistema de Conversão de Voz deve permitir o mapeamento de características acústicas de sentenças pronunciadas por um falante origem para valores correspondentes da voz do falante destino, de modo que a saída processada é percebida como uma sentença pronunciada pelo falante destino. Nas últimas duas décadas, o número de contribuições cientícas relacionadas ao problema de conversão de voz tem crescido consideravelmente, e um panorama sólido do processo histórico, assim como de técnicas propostas são indispensáveis para contribuição neste campo. O objetivo deste trabalho é realizar um levantamento geral das técnicas utilizadas para resolver o problema, apontando vantagens e desvantagens de cada método, e a partir deste estudo, desenvolver novas ferramentas. Dentre as contribuições do trabalho, foram desenvolvidos um método para decomposição espectral em termos de bases radiais, mapas fonéticos articiais, agrupamentos k-verossímeis, funções de empenamento em frequência entre outras, com o intuito de implementar um sistema de conversão de voz inter-linguístico independente de texto de alta qualidade. / Voice conversion is an emergent problem in voice and speech processing with increasing commercial interest, due to applications such as Speech-to-Speech Translation (SST) and personalized Text-To-Speech (TTS) systems. A Voice Conversion system should allow the mapping of acoustical features of sentences pronounced by a source speaker to values corresponding to the voice of a target speaker, in such a way that the processed output is perceived as a sentence uttered by the target speaker. In the last two decades the number of scientic contributions to the voice conversion problem has grown considerably, and a solid overview of the historical process as well as of the proposed techniques is indispensable for those willing to contribute to the eld. The goal of this work is to provide a critical survey that combines historical presentation to technical discussion while pointing out advantages and drawbacks of each technique, and from this study, to develop new tools. Some contributions proposed in this work include a method for spectral decomposition in terms of radial basis functions, articial phonetic map, warping functions among others, in order to implement a text-independent crosslingual voice conversion system of high quality. Conversão de voz Conversão Inter-linguística. Cross-Lingual Voice Conversion. Voice Conversion
6	Morphlet: uma nova família de transformadas wavelet aplicadas ao processo de conversão de voz / Morphlet: a new wavelet transform family applied for voice conversion process Vieira, Lucimar Sasso 27 January 2012 (has links) O presente trabalho de doutorado teve por objetivo a criaçãao de uma nova família de transformadas wavelet, chamadas Morphlets, que é específica para o processo de conversão de voz. Antes de explicar a criação da Morphlet, foi apresentada uma breve revisão bibliográfica sobre o funcionamento da Transformada Discreta Wavelet, sobre os processos de conversão de voz, algoritmos para criação de novas wavelets, entre outros tópicos. Em seguida é feita uma descrição detalhada da técnica utilizada para criação das Morphlets e, posteriormente, foi apresentado um novo algoritmo para conversão de voz baseado nas Morphlets. A criação das Morphlet, assim como o algoritmo proposto para conversão de voz baseado nela, inexistem na literatura, até o presente momento. Para testar à eficiência da técnica proposta de conversão de voz usando as Morphlets foram realizados testes diversos, principalmente baseados em critérios perceptuais, sendo os resultados obtidos motivadores, o que indicou um avanço na área. / The objective of this PhD work is the creation of a new family of wavelet transforms, called Morphlets, particularly designed for voice conversion. Before explaining the creation of this new family of transforms, a brief literature review on the Discrete Wavelet Transform, voice morphing procedures, algorithms to create specific wavelets, among other topics, was presented. This is followed by a detailed description of the technique used to create the new family of wavelets and, then, the algorithm for voice conversion based on this new transform. To date, the creation of Morphlets and the algorithm for voice conversion do not exist in the literature. In order to test the effectiveness of the proposed approach, many tests, mainly based on perceptual criteria, were performed. The results indicate an improvement in the area. Wavelets Conversão de voz Morphlets Morphlets Processamento de sinais Signal processing Voice conversion Wavelets
7	Morphlet: uma nova família de transformadas wavelet aplicadas ao processo de conversão de voz / Morphlet: a new wavelet transform family applied for voice conversion process Lucimar Sasso Vieira 27 January 2012 (has links) O presente trabalho de doutorado teve por objetivo a criaçãao de uma nova família de transformadas wavelet, chamadas Morphlets, que é específica para o processo de conversão de voz. Antes de explicar a criação da Morphlet, foi apresentada uma breve revisão bibliográfica sobre o funcionamento da Transformada Discreta Wavelet, sobre os processos de conversão de voz, algoritmos para criação de novas wavelets, entre outros tópicos. Em seguida é feita uma descrição detalhada da técnica utilizada para criação das Morphlets e, posteriormente, foi apresentado um novo algoritmo para conversão de voz baseado nas Morphlets. A criação das Morphlet, assim como o algoritmo proposto para conversão de voz baseado nela, inexistem na literatura, até o presente momento. Para testar à eficiência da técnica proposta de conversão de voz usando as Morphlets foram realizados testes diversos, principalmente baseados em critérios perceptuais, sendo os resultados obtidos motivadores, o que indicou um avanço na área. / The objective of this PhD work is the creation of a new family of wavelet transforms, called Morphlets, particularly designed for voice conversion. Before explaining the creation of this new family of transforms, a brief literature review on the Discrete Wavelet Transform, voice morphing procedures, algorithms to create specific wavelets, among other topics, was presented. This is followed by a detailed description of the technique used to create the new family of wavelets and, then, the algorithm for voice conversion based on this new transform. To date, the creation of Morphlets and the algorithm for voice conversion do not exist in the literature. In order to test the effectiveness of the proposed approach, many tests, mainly based on perceptual criteria, were performed. The results indicate an improvement in the area. Wavelets Conversão de voz Morphlets Processamento de sinais Morphlets Signal processing Voice conversion Wavelets
8	NEW APPROACHES TO VOICE CONVERSION USING STATISTICAL MAPPING FUNCTIONS Mohsen Ahangardarabi (8061824) 05 December 2019 <div><div><div><p>VOICE conversion (VC) is the process whereby the speech signal of one speaker (source) is transformed into the the voice of another speaker (target). Voice con- version can be used in many applications, example of which includes text to speech; speaker recognition; noise reduction in speech; neutral speech to emotional speech conversion; movie, animation, and music industry applications. The features trans- formed in VC systems are typically the parameters characterizing the speech and speaker individuality, including the fundamental frequency, spectral envelope, ape- riodicity, and phoneme duration. Among these, the spectral envelope is one of the most significant characteristics of the speaker identity. In this thesis, we propose four new approaches for spectral conversion: Mixture Density Network (MDN); Dynamic Multi-band Random Forest (DMRF); State Space Model (SSM) employing the Gaus- sian Mixture Model (GMM) for state-vector sequence conversion (SSM-GMM); and Sub-band Deep Gaussian Processes (SDGP). These new conversion methods were developed for both speech and singing applications. Experimental results show that the new methods have performance advantages over the conventional methods both subjectively and objectively.</p></div></div></div> voice conversion statistical mapping function
9	Generative Adversarial Networks for Cross-Lingual Voice Conversion Ankaräng, Fredrik January 2021 (has links) Speech synthesis is a technology that increasingly influences our daily lives, in the form of smart assistants, advanced translation systems and similar applications. In this thesis, the phenomenon of making one’s voice sound like the voice of someone else is explored. This topic is called voice conversion and needs to be done without altering the linguistic content of speech. More specifically, a Cycle-Consistent Adversarial Network that has proven to work well in a monolingual setting, is evaluated in a multilingual environment. The model is trained to convert voices between native speakers from the Nordic countries. In the experiments no parallel, transcribed or aligned speech data is being used, forcing the model to focus on the raw audio signal. The goal of the thesis is to evaluate if performance is degraded in a multilingual environment, in comparison to monolingual voice conversion, and to measure the impact of the potential performance drop. In the study, performance is measured in terms of naturalness and speaker similarity between the generated speech and the target voice. For evaluation, listening tests are conducted, as well as objective comparisons of the synthesized speech. The results show that voice conversion between a Swedish and Norwegian speaker is possible and also that it can be performed without performance degradation in comparison to Swedish-to-Swedish conversion. Furthermore, conversion between Finnish and Swedish speakers, as well as Danish and Swedish speakers show a performance drop for the generated speech. However, despite the performance decrease, the model produces fluent and clearly articulated converted speech in all experiments. These results are noteworthy, especially since the network is trained on less than 15 minutes of nonparallel speaker data for each speaker. This thesis opens up for further areas of research, for instance investigating more languages, more recent Generative Adversarial Network architectures and devoting more resources to tweaking the hyperparameters to further optimize the model for multilingual voice conversion. / Talsyntes är ett område som allt mer influerar vår vardag, exempelvis genom smarta assistenter, avancerade översättningssystem och liknande användningsområden. I det här examensarbetet utforskas fenomenet röstkonvertering, som innebär att man får en talare att låta som någon annan, utan att det som sades förändras. Mer specifikt undersöks ett Cycle-Consistent Adversarial Network som fungerat väl för röstkonvertering inom ett enskilt språk för röstkonvertering mellan olika språk. Det neurala nätverket tränas för konvertering mellan röster från olika modersmålstalare från de nordiska länderna. I experimenten används ingen parallell eller transkriberad data, vilket tvingar modellen att endast använda sig av ljudsignalen. Målet med examensarbetet är att utvärdera om modellens prestanda försämras i en flerspråkig kontext, jämfört med en enkelspråkig sådan, samt mäta hur stor försämringen i sådant fall är. I studien mäts prestanda i termer av kvalitet och talarlikhet för det genererade talet och rösten som efterliknas. För att utvärdera detta genomförs lyssningstester, samt objektiva analyser av det genererade talet. Resultaten visar att röstkonvertering mellan en svensk och norsk talare är möjlig utan att modellens prestanda försämras, jämfört med konvertering mellan svenska talare. För konvertering mellan finska och svenska talare, samt danska och svenska talare försämrades däremot kvaliteten av det genererade talet. Trots denna försämring producerade modellen tydligt och sammanhängande tal i samtliga experiment. Det här är anmärkningsvärt eftersom modellen tränades på mindre än 15 minuter icke-parallel data för varje talare. Detta examensarbete öppnar upp för nya framtida studier, exempelvis skulle fler språk kunna inkluderas eller nyare varianter av typen Generative Adversarial Network utvärderas. Mer resurser skulle även kunna läggas på att optimera hyperparametrarna för att ytterligare optimera den undersökta modellen för flerspråkig röstkonvertering. Generative Adversarial Network CycleGAN Cross-Lingual Voice Conversion Speech Synthesis Machine Learning Computer and Information Sciences Data- och informationsvetenskap
10	Non-Parallel Voice Conversion / Non-Parallel Voice Conversion Brukner, Jan January 2020 (has links) Cílem konverze hlasu (voice conversion, VC) je převést hlas zdrojového řečníka na hlas cílového řečníka. Technika je populární je u vtipných internetových videí, ale má také řadu seriózních využití, jako je dabování audiovizuálního materiálu a anonymizace hlasu (například pro ochranu svědků). Vzhledem k tomu, že může sloužit pro spoofing systémů identifikace hlasu, je také důležitým nástrojem pro vývoj detektorů spoofingu a protiopatření. Modely VC byly dříve trénovány převážně na paralelních (tj. dva řečníci čtou stejný text) a na vysoce kvalitních audio materiálech. Cílem této práce bylo prozkoumat vývoj VC na neparalelních datech a na signálech nízké kvality, zejména z veřejně dostupné databáze VoxCeleb. Práce vychází z moderní architektury AutoVC definované Qianem et al. Je založena na neurálních autoenkodérech, jejichž cílem je oddělit informace o obsahu a řečníkovi do samostatných nízkodimenzionýálních vektorových reprezentací (embeddingů). Cílová řeč se potom získá nahrazením embeddingu zdrojového řečníka embeddingem cílového řečníka. Qianova architektura byla vylepšena pro zpracování audio nízké kvality experimentováním s různými embeddingy řečníků (d-vektory vs. x-vektory), zavedením klasifikátoru řečníka z obsahových embeddingů v adversariálním schématu trénování neuronových sítí a laděním velikosti obsahového embeddingu tak, že jsme definovali informační bottle-neck v příslušné neuronové síti. Definovali jsme také další adversariální architekturu, která porovnává původní obsahové embeddingy s embeddingy získanými ze zkonvertované řeči. Výsledky experimentů prokazují, že neparalelní VC na nekvalitních datech je skutečně možná. Výsledná audia nebyla tak kvalitní případě hi fi vstupů, ale výsledky ověření řečníků po spoofingu výsledným systémem jasně ukázaly posun hlasových charakteristik směrem k cílovým řečníkům.

Search results