Global ETD Search

131	Modelling fundamental frequency, and its relationship to syntax, semantics, and phonetics. O'Shaughnessy, Douglas David January 1976 (has links) Thesis. 1976. Ph.D.--Massachusetts Institute of Technology. Dept. of Electrical Engineering and Computer Science. / Microfiche copy available in Archives and Engineering. / Vita. / Bibliography: leaves 403-416. / Ph.D. Speech synthesis English language Phonetics English language Syntax English language Semantics
132	A characterization of American English intonation. Maeda, Shinji January 1976 (has links) Thesis. 1976. Ph.D.--Massachusetts Institute of Technology. Dept. of Electrical Engineering and Computer Science. / Microfiche copy available in Archives and Engineering. / Bibliography: leaves 322-332. / Ph.D. Speech synthesis Speech processing systems Speech Physiological aspects
133	Évaluation expérimentale d'un système statistique de synthèse de la parole, HTS, pour la langue française / Experimental evaluation of a statistical speech synthesis system, HTS, for french Le Maguer, Sébastien 05 July 2013 (has links) Les travaux présentés dans cette thèse se situent dans le cadre de la synthèse de la parole à partir du texte et, plus précisément, dans le cadre de la synthèse paramétrique utilisant des règles statistiques. Nous nous intéressons à l'influence des descripteurs linguistiques utilisés pour caractériser un signal de parole sur la modélisation effectuée dans le système de synthèse statistique HTS. Pour cela, deux méthodologies d'évaluation objective sont présentées. La première repose sur une modélisation de l'espace acoustique, généré par HTS par des mélanges gaussiens (GMM). En utilisant ensuite un ensemble de signaux de parole de référence, il est possible de comparer les GMM entre eux et ainsi les espaces acoustiques générés par les différentes configurations de HTS. La seconde méthodologie proposée repose sur le calcul de distances entre trames acoustiques appariées pour pouvoir évaluer la modélisation effectuée par HTS de manière plus locale. Cette seconde méthodologie permet de compléter les diverses analyses en contrôlant notamment les ensembles de données générées et évaluées. Les résultats obtenus selon ces deux méthodologies, et confirmés par des évaluations subjectives, indiquent que l'utilisation d'un ensemble complexe de descripteurs linguistiques n'aboutit pas nécessairement à une meilleure modélisation et peut s'avérer contre-productif sur la qualité du signal de synthèse produit. / The work presented in this thesis is about TTS speech synthesis and, more particularly, about statistical speech synthesis for French. We present an analysis on the impact of the linguistic contextual factors on the synthesis achieved by the HTS statistical speech synthesis system. To conduct the experiments, two objective evaluation protocols are proposed. The first one uses Gaussian mixture models (GMM) to represent the acoustical space produced by HTS according to a contextual feature set. By using a constant reference set of natural speech stimuli, GMM can be compared between themselves and consequently acoustic spaces generated by HTS. The second objective evaluation that we propose is based on pairwise distances between natural speech and synthetic speech generated by HTS. Results obtained by both protocols, and confirmed by subjective evaluations, show that using a large set of contextual factors does not necessarily improve the modeling and could be counter-productive on the speech quality. Informatique Traitement automatique de la parole Hts Computer science Speech processing Text-to-Speech synthesis Hts
134	17 ways to say yes : exploring tone of voice in augmentative communication and designing new interactions with speech synthesis Pullin, Graham January 2013 (has links) For people without speech, voice output communication aids are an assistive technology––but can also be restrictive: whilst Text-To-Speech synthesis can say anything, it affords little choice of how this is spoken. An absence of nuanced tone of voice can inhibit social interaction. This research explores this profound but relatively overlooked issue in augmentative and alternative communication through the lens––with the sensibilities and skills––of interaction design. Tone of voice is such an elusive and intangible quality: difficult for even phoneticians to define, let alone AAC users and carers to discuss in the context of their everyday lives. Therefore the activities of design exploration and design practice have been employed to visualise tone of voice, in order to catalyse new conversations, through two original research projects: Six Speaking Chairs, curated with Andrew Cook, is a collection of interactive artefacts that illustrate alternative models of tone of voice developed by academics and practitioners as diverse as sociolinguists and playwrights;Speech Hedge, created with the assistance of Ryan McLeod, is a visualisation of how someone might interact with nuanced tone of voice using a conventional communication aid in combination with an interface on a smart phone. Audience responses to each project have illuminated the perspectives from which laypeople conceive of tone of voice, challenging the conventional emotional model that dominates speech technology in favour of something more complex and heterogeneous. In order to reconcile such complexity with simplicity of use, design principles have been distilled that could inspire future user interfaces but also inform further research. This research has been published and presented within different academic fields, including design research, interaction design and augmentative and alternative communication. 401
135	Time-Varying Modeling of Glottal Source and Vocal Tract and Sequential Bayesian Estimation of Model Parameters for Speech Synthesis January 2018 (has links) abstract: Speech is generated by articulators acting on a phonatory source. Identification of this phonatory source and articulatory geometry are individually challenging and ill-posed problems, called speech separation and articulatory inversion, respectively. There exists a trade-off between decomposition and recovered articulatory geometry due to multiple possible mappings between an articulatory configuration and the speech produced. However, if measurements are obtained only from a microphone sensor, they lack any invasive insight and add additional challenge to an already difficult problem. A joint non-invasive estimation strategy that couples articulatory and phonatory knowledge would lead to better articulatory speech synthesis. In this thesis, a joint estimation strategy for speech separation and articulatory geometry recovery is studied. Unlike previous periodic/aperiodic decomposition methods that use stationary speech models within a frame, the proposed model presents a non-stationary speech decomposition method. A parametric glottal source model and an articulatory vocal tract response are represented in a dynamic state space formulation. The unknown parameters of the speech generation components are estimated using sequential Monte Carlo methods under some specific assumptions. The proposed approach is compared with other glottal inverse filtering methods, including iterative adaptive inverse filtering, state-space inverse filtering, and the quasi-closed phase method. / Dissertation/Thesis / Masters Thesis Electrical Engineering 2018 Electrical engineering Acoustic to articulatory inversion articulatory synthesis blind deconvolution Glottal inverse filtering speech synthesis Vocal tract estimation
136	Alternativa metoder för att kontrollera ett användargränsnitt i en browser för teknisk dokumentation / Alternative methods for controlling the user interface in a browser for technical documentation Svensson, Cecilia January 2003 (has links) <p>When searching for better and more practical interfaces between users and their computers, additional or alternative modes of communication between the two parties would be of great use. This thesis handles the possibilities of using eye and head movements as well as voice input as these alternative modes of communication. </p><p>One part of this project is devoted to find possible interaction techniques when navigating in a computer interface with movements of the eye or the head. The result of this part is four different controls of an interface, adapted to suit this kind of navigation, combined together in a demo application. </p><p>Another part of the project is devoted to the development of an application, with voice control as primary input method. The application developed is a simplified version of the application ActiViewer., developed by AerotechTelub Information&Media AB.</p> Datorteknik Eye tracking head tracking user controls speech recognition speech synthesis grammar command & control Datorteknik Computer engineering Datorteknik
137	Estimation of glottal source features from the spectral envelope of the acoustic speech signal Torres, Juan Félix 17 May 2010 (has links) Speech communication encompasses diverse types of information, including phonetics, affective state, voice quality, and speaker identity. From a speech production standpoint, the acoustic speech signal can be mainly divided into glottal source and vocal tract components, which play distinct roles in rendering the various types of information it contains. Most deployed speech analysis systems, however, do not explicitly represent these two components as distinct entities, as their joint estimation from the acoustic speech signal becomes an ill-defined blind deconvolution problem. Nevertheless, because of the desire to understand glottal behavior and how it relates to perceived voice quality, there has been continued interest in explicitly estimating the glottal component of the speech signal. To this end, several inverse filtering (IF) algorithms have been proposed, but they are unreliable in practice because of the blind formulation of the separation problem. In an effort to develop a method that can bypass the challenging IF process, this thesis proposes a new glottal source information extraction method that relies on supervised machine learning to transform smoothed spectral representations of speech, which are already used in some of the most widely deployed and successful speech analysis applications, into a set of glottal source features. A transformation method based on Gaussian mixture regression (GMR) is presented and compared to current IF methods in terms of feature similarity, reliability, and speaker discrimination capability on a large speech corpus, and potential representations of the spectral envelope of speech are investigated for their ability represent glottal source variation in a predictable manner. The proposed system was found to produce glottal source features that reasonably matched their IF counterparts in many cases, while being less susceptible to spurious errors. The development of the proposed method entailed a study into the aspects of glottal source information that are already contained within the spectral features commonly used in speech analysis, yielding an objective assessment regarding the expected advantages of explicitly using glottal information extracted from the speech signal via currently available IF methods, versus the alternative of relying on the glottal source information that is implicitly contained in spectral envelope representations. Inverse filtering Glottal waveform Voice source Speech processing Glottalization (Phonetics) Speech synthesis Machine learning Supervised learning (Machine learning)
138	Μοντελοποίηση και ψηφιακή επεξεργασία προσωδιακών φαινομένων της ελληνικής γλώσσας με εφαρμογή στην σύνθεση ομιλίας / Modeling and signal processing of greek language prosodic events with application to speech synthesis Ζέρβας, Παναγιώτης 04 February 2008 (has links) Αντικείμενο της παρούσης διδακτορικής διατριβής αποτελεί η μελέτη και μοντελοποίηση των φαινομένων επιτονισμού της Ελληνικής γλώσσας με εφαρμογές στην σύνθεση ομιλίας. Στα πλαίσια της διατριβής αυτής αναπτύχθηκαν πόροι ομιλίας και εργαλεία για την επεξεργασία και μελέτη προσωδιακών παραγόντων οι οποίοι επηρεάζουν την πληροφορία που μεταφέρεται μέσω του προφορικού λόγου. Για την διαχείρηση και επεξεργασία των παραπάνω πόρων υλοποιήθηκε πλατφόρμα μετατροπής κειμένου σε ομιλία βασισμένη στην συνένωση δομικών μονάδων ομιλίας. Για την μελέτη και την δημιουργία των μοντέλων μηχανικής μάθησης χρησιμοποιήθηκε η γλωσσολογική αναπαράσταση GRToBI των φαινομένων επιτονισμού. / In this thesis we cope with the task of studying and modeling prosodic phenomena encountered in Greek language with applications to the task of speech synthesis from tex. Thus, spoken corpora with various levels of morphosyntactical and linguistic representation as well as tools for their processing, we constructed. For the task of coding the emerged prosodic phenomena of our recorded utterences we have utilized the GRToBI annotation of speech. Προσωδία Επιτονισμός Σύνθεση ομιλίας Μηχανική μάθηση 621.382 23 Prosody Intonation Speech signal processing Speech synthesis Machine learning
139	The punctuation and intonation of parentheticals Bodenbender, Christel 17 May 2010 (has links) From a historical perspective, punctuation marks are often assumed to only represent some of the phonetic structure of the spoken form of that text. It has been argued recently that punctuation today is a linguistic system that not only represents some of the phonetic sentence structure but also syntactic as well as semantic information. One case in point is the observation that the semantic difference in differently punctuated parenthetical phrases is not reflected in the intonation contour. This study provides the acoustic evidence for this observation. Furthermore, this study makes recommendations to achieve natural-sounding text-to-speech output for English parentheticals by incorporating the study's findings with respect to parenthical intonation. The experiment conducted for this study involved three male and three female native speakers of Canadian English reading aloud a set of 20 sentences with parenthetical and non-parenthetical phrases. These sentences were analyzed with respect to acoustic characteristics due to differences in punctuation as well as due to differences between parenthetical and non-parenthetical phrases. A number of conclusions were drawn based on the results of the experiment: (1) a difference in punctuation, although entailing a semantic difference, is not reflected in the intonation pattern; (2) in contrast to the general understanding that parenthetical phrases are lower-leveled and narrower in pitch range than the surrounding sentence, this study shows that it is not the parenthetical phrase itself that is implemented differently from its non-parenthetical counterpart; rather, the phrase that precedes the parenthetical exhibits a lower baseline and with that a wider pitch range than the corresponding phrase in a non-parenthetical sentence; (3) sentences with two adjacent parenthetical phrases or one embedded in the other exhibit the same pattern for the parenthetical-preceding phrase as the sentences in (2) above and a narrowed pitch range for the parenthetical phrases that are not in the final position of the sequence of parentheticals; (4) no pausing pattern could be found; (5) the characteristics found for parenthetical phrases can be implemented in synthesized speech through the use of SABLE speech markup as part of the SABLE speech synthesis system. This is the first time that the connection between punctuation and intonation in parenthetical sentences has been investigated; it is also the first look at sentences with more than one parenthetical phrase. This study contributes to our understanding of the intonation of parenthetical phrases in English and their implementation in text-to-speech systems, by providing an analysis of their acoustic characteristics. Intonation Punctuation Parenthetical English Text-to-speech Synthesis Speech Markup Phonetics Linguistics
140	Alternativa metoder för att kontrollera ett användargränsnitt i en browser för teknisk dokumentation / Alternative methods for controlling the user interface in a browser for technical documentation Svensson, Cecilia January 2003 (has links) When searching for better and more practical interfaces between users and their computers, additional or alternative modes of communication between the two parties would be of great use. This thesis handles the possibilities of using eye and head movements as well as voice input as these alternative modes of communication. One part of this project is devoted to find possible interaction techniques when navigating in a computer interface with movements of the eye or the head. The result of this part is four different controls of an interface, adapted to suit this kind of navigation, combined together in a demo application. Another part of the project is devoted to the development of an application, with voice control as primary input method. The application developed is a simplified version of the application ActiViewer., developed by AerotechTelub Information&Media AB. Datorteknik Eye tracking head tracking user controls speech recognition speech synthesis grammar command & control Datorteknik Computer Engineering Datorteknik

Search results