Global ETD Search

1	Automatic speech segmentation with limited data / by D.R. van Niekerk Van Niekerk, Daniel Rudolph January 2009 (has links) The rapid development of corpus-based speech systems such as concatenative synthesis systems for under-resourced languages requires an efﬁcient, consistent and accurate solution with regard to phonetic speech segmentation. Manual development of phonetically annotated corpora is a time consuming and expensive process which suffers from challenges regarding consistency and reproducibility, while automation of this process has only been satisfactorily demonstrated on large corpora of a select few languages by employing techniques requiring extensive and specialised resources. In this work we considered the problem of phonetic segmentation in the context of developing small prototypical speech synthesis corpora for new under-resourced languages. This was done through an empirical evaluation of existing segmentation techniques on typical speech corpora in three South African languages. In this process, the performance of these techniques were characterised under different data conditions and the efﬁcient application of these techniques were investigated in order to improve the accuracy of resulting phonetic alignments. We found that the application of baseline speaker-speciﬁc Hidden Markov Models results in relatively robust and accurate alignments even under extremely limited data conditions and demonstrated how such models can be developed and applied efﬁciently in this context. The result is segmentation of sufﬁcient quality for synthesis applications, with the quality of alignments comparable to manual segmentation efforts in this context. Finally, possibilities for further automated reﬁnement of phonetic alignments were investigated and an efﬁcient corpus development strategy was proposed with suggestions for further work in this direction. / Thesis (M.Ing. (Computer Engineering))--North-West University, Potchefstroom Campus, 2009. Phonetic speech segmentation Phonetic alignment Speech synthesis Text-to-speech Speech corpus development Resource scarce languages Hidden Markov models Dynamic time warping
2	Automatic speech segmentation with limited data / by D.R. van Niekerk Van Niekerk, Daniel Rudolph January 2009 (has links) The rapid development of corpus-based speech systems such as concatenative synthesis systems for under-resourced languages requires an efﬁcient, consistent and accurate solution with regard to phonetic speech segmentation. Manual development of phonetically annotated corpora is a time consuming and expensive process which suffers from challenges regarding consistency and reproducibility, while automation of this process has only been satisfactorily demonstrated on large corpora of a select few languages by employing techniques requiring extensive and specialised resources. In this work we considered the problem of phonetic segmentation in the context of developing small prototypical speech synthesis corpora for new under-resourced languages. This was done through an empirical evaluation of existing segmentation techniques on typical speech corpora in three South African languages. In this process, the performance of these techniques were characterised under different data conditions and the efﬁcient application of these techniques were investigated in order to improve the accuracy of resulting phonetic alignments. We found that the application of baseline speaker-speciﬁc Hidden Markov Models results in relatively robust and accurate alignments even under extremely limited data conditions and demonstrated how such models can be developed and applied efﬁciently in this context. The result is segmentation of sufﬁcient quality for synthesis applications, with the quality of alignments comparable to manual segmentation efforts in this context. Finally, possibilities for further automated reﬁnement of phonetic alignments were investigated and an efﬁcient corpus development strategy was proposed with suggestions for further work in this direction. / Thesis (M.Ing. (Computer Engineering))--North-West University, Potchefstroom Campus, 2009. Phonetic speech segmentation Phonetic alignment Speech synthesis Text-to-speech Speech corpus development Resource scarce languages Hidden Markov models Dynamic time warping
3	Singing-driven interfaces for sound synthesizers Janer Mestres, Jordi 14 March 2008 (has links) Els instruments musicals digitals es descomponen usualment en dues parts: la interfície d'usuari i el motor de síntesi. Tradicionalment la interfície d'usuari pren el nom de controlador musical. L'objectiu d'aquesta tesi és el disseny d'un interfície que permeti el control de la síntesi de sons instrumentals a partir de la veu cantada.Amb la present recerca, intentem relacionar la veu amb el so dels instruments musicals, tenint en compte tan la descripció del senyal de veu, com les corresponents estratègies de mapeig per un control adequat del sintetitzador.Proposem dos enfocaments diferents, d'una banda el control d'un sintetitzador de veu cantada, i d'altra banda el control de la síntesi de sons instrumentals. Per aquest últim, suggerim una representació del senyal de veu com a gests vocals, que inclou una sèrie d'algoritmes d'anàlisis de veu. A la vegada, per demostrar els resultats obtinguts, hem desenvolupat dos prototips a temps real. / Los instrumentos musicales digitales se pueden separar en dos componentes: el interfaz de usuario y el motor de sintesis. El interfaz de usuario se ha denominado tradicionalmente controlador musical. El objectivo de esta tesis es el diseño de un interfaz que permita el control de la sintesis de sonidos instrumentales a partir de la voz cantada.La presente investigación pretende relacionar las caracteristicas de la voz con el sonido de los instrumentos musicales, teniendo en cuenta la descripción de la señal de voz, como las correspondientes estrategias de mapeo para un control apropiado del sintetizador. Se proponen dos enfoques distintos, el control de un sintetizador de voz cantada, y el control de la sintesis de sonidos insturmentales. Para este último, se sugiere una representación de la señal de voz como gestos vocales, incluyendo varios algoritmos de analisis de voz. Los resultados obtenidos se demuestran con dos prototipos a tiempo real. / Digital musical instruments are usually decomposed in two main constituent parts: a user interface and a sound synthesis engine. The user interface is popularly referred as a musical controller, and its design is the primary objective of this dissertation. Under the title of singing-driven interfaces, we aim to design systems that allow controlling the synthesis of musical instruments sounds with the singing voice. This dissertation searches for the relationships between the voice and the sound of musical instruments by addressing both, the voice signal description, as well as the mapping strategies for a meaningful control of the synthesized sound. We propose two different approaches, one for controlling a singing voice synthesizer, and another for controlling the synthesis of instrumental sounds. For the latter, we suggest to represent voice signal as vocal gestures, contributing with several voice analysis methods.To demonstrate the obtained results, we developed two real-time prototypes. segmentation phonetic alignment formant tracking syllabling real-time mapping control vocal gestures sound synthesis user interface singing voice signal processing controlador musical alineamiento fonético segmentación analisi mapeo tiempo real estimación de formantes gestos vocales sintesis de sonido interfaz de usuario control procesado de señal voz cantada alineament fonètic controlador musical anàlisis segmentació estimació de formants temps real mapeig control gests vocals interfícies d'usuari síntesi sonora veu cantada processament del senyal musical controller 004 531/534 78

1

Page generated in 0.0686 seconds