• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 139
  • 72
  • 48
  • 24
  • 17
  • 11
  • 4
  • 4
  • 3
  • 2
  • 2
  • 2
  • 2
  • 1
  • 1
  • Tagged with
  • 387
  • 119
  • 83
  • 71
  • 67
  • 63
  • 50
  • 49
  • 29
  • 28
  • 27
  • 25
  • 25
  • 25
  • 24
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
161

Large-scale acoustic and prosodic investigations of french

Nemoto, Rena 16 November 2011 (has links) (PDF)
This thesis focuses on acoustic and prosodic (fundamental frequency (F0), duration, intensity) analyses of French from large-scale audio corpora portraying different speaking styles: prepared and spontaneous speech. We are interested in particularities of segmental phonetics and prosody that may characterize pronunciation. In French, many errors caused by automatic speech recognition (ASR) systems arise from frequent homophone words, for which ASR systems depend on language model weights. Automatic classification (AC) was conducted to discriminate homophones by only acoustic and prosodic properties depending on their part-of-speech function or their position within prosodic words. Results from AC of two homophone pairs, et/est (and/is) and à/a (ton/has), revealed that the et/est pair was more discriminable. A selection of prosodic and inter-phoneme attributes, that is 15 attributes, performed as good results as with 62 attributes. Then corresponding perceptual tests have been conducted to verify if humans also use acoustico-prosodic parameters for the discrimination. Results suggested that acoustic and prosodic information might help in operating the correct choice in similar ambiguous syntactic structures. From the hypothesis that pronunciation variants were due to varying prosodic constraints, we examined overall prosodic properties of French on a lexical and phrase level. The comparison between lexical and grammatical words revealed F0 rise and lengthening at the end of final syllable on lexical words, while these phenomena were not observed for grammatical words. Analyses also revealed that the mean profile of a n length noun phrase could be different from that of a n length noun with a low F0 at the beginning of a noun phrase. The prosodic profiles can be helpful to locate word boundaries. Findings in this thesis will lead to localize focus and named-entity using discriminative classifiers, and to improve word boundary locations by an ASR post-processing step.
162

Advanced natural language processing for improved prosody in text-to-speech synthesis / G. I. Schlünz

Schlünz, Georg Isaac January 2014 (has links)
Text-to-speech synthesis enables the speech-impeded user of an augmentative and alternative communication system to partake in any conversation on any topic, because it can produce dynamic content. Current synthetic voices do not sound very natural, however, lacking in the areas of emphasis and emotion. These qualities are furthermore important to convey meaning and intent beyond that which can be achieved by the vocabulary of words only. Put differently, speech synthesis requires a more comprehensive analysis of its text input beyond the word level to infer the meaning and intent that elicit emphasis and emotion. The synthesised speech then needs to imitate the effects that these textual factors have on the acoustics of human speech. This research addresses these challenges by commencing with a literature study on the state of the art in the fields of natural language processing, text-to-speech synthesis and speech prosody. It is noted that the higher linguistic levels of discourse, information structure and affect are necessary for the text analysis to shape the prosody appropriately for more natural synthesised speech. Discourse and information structure account for meaning, intent and emphasis, and affect formalises the modelling of emotion. The OCC model is shown to be a suitable point of departure for a new model of affect that can leverage the higher linguistic levels. The audiobook is presented as a text and speech resource for the modelling of discourse, information structure and affect because its narrative structure is prosodically richer than the random constitution of a traditional text-to-speech corpus. A set of audiobooks are selected and phonetically aligned for subsequent investigation. The new model of discourse, information structure and affect, called e-motif, is developed to take advantage of the audiobook text. It is a subjective model that does not specify any particular belief system in order to appraise its emotions, but defines only anonymous affect states. Its cognitive and social features rely heavily on the coreference resolution of the text, but this process is found not to be accurate enough to produce usable features values. The research concludes with an experimental investigation of the influence of the e-motif features on human speech and synthesised speech. The aligned audiobook speech is inspected for prosodic correlates of the cognitive and social features, revealing that some activity occurs in the into national domain. However, when the aligned audiobook speech is used in the training of a synthetic voice, the e-motif effects are overshadowed by those of structural features that come standard in the voice building framework. / PhD (Information Technology), North-West University, Vaal Triangle Campus, 2014
163

The priority of temporal aspects in L2-Swedish prosody : Studies in perception and production

Thorén, Bosse January 2008 (has links)
Foreign accent can be everything from hardly detectable to rendering the second language speech unintelligible. It is assumed that certain aspects of a specific target language contribute more to making the foreign accented speech intelligible and listener friendly, than others. The present thesis examines a teaching strategy for Swedish pronunciation in second language education. The teaching strategy “Basic prosody” or BP, gives priority to temporal aspects of Swedish prosody, which means the temporal phonological contrasts word stress and quantity, as well as the durational realizations of these contrasts. BP does not prescribe any specific tonal realizations. This standpoint is based on the great regional variety in realization and distribution of Swedish word accents. The teaching strategy consists virtually of three directives: · Stress the proper word in the sentence. · Stress proper syllables in stressed words and make them longer. · Lengthen the proper segment – vowel or subsequent consonant – in the stressed syllable. These directives reflect the view that all phonological length is stress-induced, and that vowel length and consonant length are equally important as learning goals. BP is examined in the light of existing findings in the field of second language pronunciation and with respect to the phonetic correlates of Swedish stress and quantity. Five studies examine the relation between segment durations and the categorization made by native Swedish listeners. The results indicate that the postvocalic consonant duration contributes to quantity categorization as well as giving the proper duration to stressed syllables. Furthermore, native Swedish speakers are shown to apply the complementary /V: C/ - /VC:/ pattern also when speaking English and German, by lengthening postvocalic consonants. The correctness of the priority is not directly addressed but important aspects of BP are supported by earlier findings as well as the results from the present studies. / <p>För att köpa boken skicka en beställning till exp@ling.su.se/ To order the book send an e-mail to exp@ling.su.se</p>
164

The effects of part–of–speech tagging on text–to–speech synthesis for resource–scarce languages / G.I. Schlünz

Schlünz, Georg Isaac January 2010 (has links)
In the world of human language technology, resource–scarce languages (RSLs) suffer from the problem of little available electronic data and linguistic expertise. The Lwazi project in South Africa is a large–scale endeavour to collect and apply such resources for all eleven of the official South African languages. One of the deliverables of the project is more natural text–to–speech (TTS) voices. Naturalness is primarily determined by prosody and it is shown that many aspects of prosodic modelling is, in turn, dependent on part–of–speech (POS) information. Solving the POS problem is, therefore, a prudent first step towards meeting the goal of natural TTS voices. In a resource–scarce environment, obtaining and applying the POS information are not trivial. Firstly, an automatic tagger is required to tag the text to be synthesised with POS categories, but state–of–the–art POS taggers are data–driven and thus require large amounts of labelled training data. Secondly, the subsequent processes in TTS that are used to apply the POS information towards prosodic modelling are resource–intensive themselves: some require non–trivial linguistic knowledge; others require labelled data as well. The first problem asks the question of which available POS tagging algorithm will be the most accurate on little training data. This research sets out to answer the question by reviewing the most popular supervised data–driven algorithms. Since literature to date consists mostly of isolated papers discussing one algorithm, the aim of the review is to consolidate the research into a single point of reference. A subsequent experimental investigation compares the tagging algorithms on small training data sets of English and Afrikaans, and it is shown that the hidden Markov model (HMM) tagger outperforms the rest when using both a comprehensive and a reduced POS tagset. Regarding the second problem, the question arises whether it is perhaps possible to circumvent the traditional approaches to prosodic modelling by learning the latter directly from the speech data using POS information. In other words, does the addition of POS features to the HTS context labels improve the naturalness of a TTS voice? Towards answering this question, HTS voices are trained from English and Afrikaans prosodically rich speech. The voices are compared with and without POS features incorporated into the HTS context labels, analytically and perceptually. For the analytical experiments, measures of prosody to quantify the comparisons are explored. It is then also noted whether the results of the perceptual experiments correlate with their analytical counterparts. It is found that, when a minimal feature set is used for the HTS context labels, the addition of POS tags does improve the naturalness of the voice. However, the same effect can be accomplished by including segmental counting and positional information instead of the POS tags. / Thesis (M.Sc. Engineering Sciences (Electrical and Electronic Engineering))--North-West University, Potchefstroom Campus, 2011.
165

Integrating Prosody into an Account of Discourse Structure

Gustafson-Capková, Sofia January 2005 (has links)
In this thesis a study of discourse segmenting is carried out, which investigates both segment boundaries and segment content. The results are related to discourse theory. We study the questions of how the prosody and the text structure influence subjects' annotations of discourse boundaries and discourse prominence. The hypothesis was that the annotations would be influenced by the discourse type. Two studies were carried out. 1) a study of boundary annotation, 2) a study of prominence annotation. All studies were made on four different discourse types, scripted and spontaneous monologue and scripted and spontaneous dialogue. In addition the annotations were carried out under two different conditions 1) based on transcripts alone and 2) based on transcripts together with access to the speech signal. The results indicate that the boundary annotations were less dependent on the speech signal than the prominence annotations. It seems that subjects have segmented on the basis of the text structure, while prominence to a great extent was annotated on the basis of the prosody. In the case of boundary markings the boundary context in terms of parts of speech differs across speaking styles, which is not the case for the prominences. A separate study of segment intentions was also made, and it was found that the interpretation of a specific intention, questions, seems to be arrived at primarily on the basis of the text structure. However, in some cases also the prosody affects the annotations. The picture that emerges indicates a distribution of labour between text structure and prosody, governed by the principle of economy. In cases where the boundaries were less well definied, as in e.g. spontaneous monologue, the pattern of the prominences was clearer. In cases where the boundaries were more clearly indicated, as in read aloud text, the prominences were less clearly communicated. The findings were interpreted within Grosz and Sidner's (1986) discourse theory. It is suggested that differences in the segmenting strategy originating from the interaction of text structure and prosody can be expressed as differences in the contributions from the different components of discourse suggested in the framework of Grosz and Sidner (1986). / <p>För att köpa boken skicka en beställning till exp@ling.su.se/ To order the book send an e-mail to exp@ling.su.se</p>
166

The effects of part–of–speech tagging on text–to–speech synthesis for resource–scarce languages / G.I. Schlünz

Schlünz, Georg Isaac January 2010 (has links)
In the world of human language technology, resource–scarce languages (RSLs) suffer from the problem of little available electronic data and linguistic expertise. The Lwazi project in South Africa is a large–scale endeavour to collect and apply such resources for all eleven of the official South African languages. One of the deliverables of the project is more natural text–to–speech (TTS) voices. Naturalness is primarily determined by prosody and it is shown that many aspects of prosodic modelling is, in turn, dependent on part–of–speech (POS) information. Solving the POS problem is, therefore, a prudent first step towards meeting the goal of natural TTS voices. In a resource–scarce environment, obtaining and applying the POS information are not trivial. Firstly, an automatic tagger is required to tag the text to be synthesised with POS categories, but state–of–the–art POS taggers are data–driven and thus require large amounts of labelled training data. Secondly, the subsequent processes in TTS that are used to apply the POS information towards prosodic modelling are resource–intensive themselves: some require non–trivial linguistic knowledge; others require labelled data as well. The first problem asks the question of which available POS tagging algorithm will be the most accurate on little training data. This research sets out to answer the question by reviewing the most popular supervised data–driven algorithms. Since literature to date consists mostly of isolated papers discussing one algorithm, the aim of the review is to consolidate the research into a single point of reference. A subsequent experimental investigation compares the tagging algorithms on small training data sets of English and Afrikaans, and it is shown that the hidden Markov model (HMM) tagger outperforms the rest when using both a comprehensive and a reduced POS tagset. Regarding the second problem, the question arises whether it is perhaps possible to circumvent the traditional approaches to prosodic modelling by learning the latter directly from the speech data using POS information. In other words, does the addition of POS features to the HTS context labels improve the naturalness of a TTS voice? Towards answering this question, HTS voices are trained from English and Afrikaans prosodically rich speech. The voices are compared with and without POS features incorporated into the HTS context labels, analytically and perceptually. For the analytical experiments, measures of prosody to quantify the comparisons are explored. It is then also noted whether the results of the perceptual experiments correlate with their analytical counterparts. It is found that, when a minimal feature set is used for the HTS context labels, the addition of POS tags does improve the naturalness of the voice. However, the same effect can be accomplished by including segmental counting and positional information instead of the POS tags. / Thesis (M.Sc. Engineering Sciences (Electrical and Electronic Engineering))--North-West University, Potchefstroom Campus, 2011.
167

Aspects of intonation and prosody in Bininj Gun-wok: an autosegmental-metrical analysis

Bishop, Judith Bronwyn January 2003 (has links)
This dissertation presents a qualitative and quantitative analysis of aspects of the intonation and prosody of an Australian polysynthetic language, Bininj Gun-wok (BGW; also referred to as Mayali). The theoretical framework is autosegmental-metrical phonology, as adapted to the description of intonation by Pierrehumbert (1980); Bruce (1977) and others. The analysis focuses principally on two dialects, Kuninjku and Manyallaluk Mayali (MM), with some reference to the Kunwinjku, Kune, Gun-Djeihmi and Kundedjnjenghmi dialects.
168

Génération de la prosodie audio-visuelle pour les acteurs virtuels expressifs / Generation of audio-visual prosody for expressive virtual actors

Barbulescu, Adela 23 November 2015 (has links)
Le travail presenté dans cette thèse adresse le problème de génération des performances expressives audio-visuelles pour les acteurs virtuels. Un acteur virtuel est répresenté par une tête parlante en 3D et une performance audio-visuelle contient les expressions faciales, les mouvements de la tête, la direction du regard et le signal de parole.Si une importante partie de la littérature a été dediée aux émotions, nous explorons ici les comportements expressifs verbaux qui signalent les états mentaux, i.e. "ce que le locuteur sent par rapport à ce qu'il dit". Nous explorons les caractéristiques de ces attitudes dites dramatiques et la manière dont elles sont encodées par des signatures prosodiques spécifiques pour une personne i.e. des motifs spécifiques à l'état mental de trajectoires de paramètres prosodiques audio-visuels. / The work presented in this thesis addresses the problem of generating audio-visual expressive performances for virtual actors. A virtual actor is represented by a 3D talking head and an audio-visual performance refers to facial expressions, head movements, gaze direction and the speech signal.While an important amount of work has been dedicated to emotions, we explore here expressive verbal behaviors that signal mental states, i.e "how speakers feel about what they say". We explore the characteristics of these so-called dramatic attitudes and the way they are encoded with speaker-specific prosodic signatures i.e. mental state-specific patterns of trajectories of audio-visual prosodic parameters.
169

Aquisição de fonologia : a influiência do acento e o preenchimento de unidades prosódicas em dados de fala de duas crianças entre 1;0.4 e 2;1.10 de idade, em contato com o português brasileiro falado em Alagoas e Pernambuco. / Acquisition of phonology: the influience accent and segment completion of prosodic units in speech data from two children between the ages of 1;0.4 e 2;1.10 and who were exposed to Brazilian Portuguese spoken in the states of Alagoas and Pernambuco.

Payão, Luzia Miscow da Cruz 04 November 2010 (has links)
The study delves into the influence of accent and segment completion of prosodic units in speech data from two children between the ages of 1;0.4 and 2;1.10 and who were exposed to Brazilian Portuguese spoken in the states of Alagoas and Pernambuco. It was hypothesized that grammatical processing stems from two concurrent movements in opposing directions during phonological acquisition: a centripetal movement triggering segmentation of the prominent syllable and an opposing, centrifugal one aimed at segment completion of prosodic units. These analytical movements imply a hierarchical basis of relationships between its constituent structures, an assumption backed by autosegmental phonology (GOLDSMITH, 1995; CLEMENTS; HUME, 1995; MOTA, 1996) and prosodic phonology (NESPOR; VOGEL, 1986; SCARPA, 1997, 1999a; SANTOS; SCARPA, 2005). The methodology consisted of a observational and descriptive follow-up with parental consent. The children s spontaneous speech while playfully interacting with parents was digitally recorded over a 7-month period. Data showed that identifying word stress favors the handling of phonological material in the stressed syllable under centrifugal action, thus leading to segment completion of both post-tonic and pre-tonic syllables in accordance with the metrical foot of the target word. A tendency towards completion of the syllable structure and distinction of segment classes was seen in the stressed and post-tonic syllables, influenced by the prevalence of words having a trochaic stress pattern. The organizational hierarchy of the language was shown to guide and drive these movements of centripetal-centrifugal analyses that occur at different phonological levels prosodic and segmental. / O estudo investiga a influência do acento e o preenchimento segmental de unidades prosódicas em dados de fala de duas crianças entre 1;0.4 e 2;1.10 de idade, expostas ao português brasileiro falado em Alagoas e Pernambuco. Partiu-se da hipótese de processamento gramatical mediante dois movimentos de direções opostas co-ocorrentes na aquisição fonológica: a centrípeta desencadeando a segmentação da sílaba proeminente e a oposta, centrífuga, destinada aos preenchimentos segmentais das unidades prosódicas. Nesses movimentos de análises está implícita a base hierárquica das relações entre as estruturas constituintes, pressuposto defendido nas fonologias autossegmental (GOLDSMITH, 1995; CLEMENTS; HUME, 1995; MOTA, 1996) e prosódica (NESPOR; VOGEL, 1986; SCARPA, 1997, 1999a; SANTOS; SCARPA, 2005). A metodologia consistiu do acompanhamento observacional e descritivo, com o consentimento, durante sete meses, de registros de fala espontânea das crianças, em interação lúdica com os pais, gravados em áudio digital. Os dados mostraram que a identificação do acento da palavra favorece a manipulação do material fonológico na sílaba tônica, sob ação centrífuga, influenciando o preenchimento segmental tanto de sílaba pós-tônica como de pré-tônica em conformidade com pé métrico da palavra-alvo. Observou-se, no entanto, a tendência para o preenchimento da estrutura silábica e a diferenciação de classes segmentais na sílaba tônica e na pós-tônica, sob influência do predomínio de palavras com o padrão de acento troqueu. Constata-se que a hierarquia organizacional da língua rege e impulsiona esses movimentos de análises centrípeto-centrífuga que operam em diferentes níveis fonológicos prosódico e segmental.
170

Prozódická analýza urban music ve francouzštině a v češtině / Prosodic analysis of urban music in French and Czech

Chodaková, Polina January 2016 (has links)
in English TITLE : Prosodic Analysis of Urban Music in French and Czech AUTHOR : Mgr. Polina Chodaková DEPARTMENT : Institute of Romance Studies FF UK SUPERVISORS : doc. PhDr. Tomáš Duběda, Ph.D., prof. Philippe Martin, Dr ès Sci, Dr Ling. KEY WORDS : Prosody, metrics, stress, intonation, declamation, rap, reggae This thesis deals with the rhythm, stress and intonation in rap and reggae music. It describes the form features of declamations which combine chant, half- singing and singing, in the theoretical framework of contrastive prosody and verse theory. The thesis consists of seven chapters and is based on a textual corpus of 200 songs in French and Czech, assembled for this dissertation. The linguistic material of 59,000 syllables is a representative set of excerpts, transcribed in rhythmic grids with an auditive analysis. From the prosodic point of view, rap and reggae display an important degree of rhythmic reorganisation. In both languages, setting texts to music is performed ac- cording to an isochronous pattern, which is imposed on the lyrics with an isosyllabic rhythm and whose bound stress system is weak. This is shown through interactive constraints, which reflect universal tendencies in verbal art, that both genres exhibit a lot of freedom in the association of lyrics and the musical...

Page generated in 1.3729 seconds