• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 88
  • 46
  • 46
  • 10
  • 4
  • 3
  • 3
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • Tagged with
  • 246
  • 106
  • 103
  • 89
  • 51
  • 30
  • 29
  • 28
  • 23
  • 22
  • 22
  • 21
  • 20
  • 20
  • 19
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
111

Implementing a distributed approach for speech resource and system development / Nkadimeng Raymond Molapo

Molapo, Nkadimeng Raymond January 2014 (has links)
The range of applications for high-quality automatic speech recognition (ASR) systems has grown dramatically with the advent of smart phones, in which speech recognition can greatly enhance the user experience. Currently, the languages with extensive ASR support on these devices are languages that have thousands of hours of transcribed speech corpora already collected. Developing a speech system for such a language is made simpler because extensive resources already exist. However for languages that are not as prominent, the process is more difficult. Many obstacles such as reliability and cost have hampered progress in this regard, and various separate tools for every stage of the development process have been developed to overcome these difficulties. Developing a system that is able to combine these identified partial solutions, involves customising existing tools and developing new ones to interface the overall end-to-end process. This work documents the integration of several tools to enable the end-to-end development of an Automatic Speech Recognition system in a typical under-resourced language. Google App Engine is employed as the core environment for data verification, storage and distribution, and used in conjunction with existing tools for gathering text data and for speech data recording. We analyse the data acquired by each of the tools and develop an ASR system in Shona, an important under-resourced language of Southern Africa. Although unexpected logistical problems complicated the process, we were able to collect a useable Shona speech corpus, and develop the first Automatic Speech Recognition system in that language. / MIng (Computer and Electronic Engineering), North-West University, Potchefstroom Campus, 2014
112

Implementing a distributed approach for speech resource and system development / Nkadimeng Raymond Molapo

Molapo, Nkadimeng Raymond January 2014 (has links)
The range of applications for high-quality automatic speech recognition (ASR) systems has grown dramatically with the advent of smart phones, in which speech recognition can greatly enhance the user experience. Currently, the languages with extensive ASR support on these devices are languages that have thousands of hours of transcribed speech corpora already collected. Developing a speech system for such a language is made simpler because extensive resources already exist. However for languages that are not as prominent, the process is more difficult. Many obstacles such as reliability and cost have hampered progress in this regard, and various separate tools for every stage of the development process have been developed to overcome these difficulties. Developing a system that is able to combine these identified partial solutions, involves customising existing tools and developing new ones to interface the overall end-to-end process. This work documents the integration of several tools to enable the end-to-end development of an Automatic Speech Recognition system in a typical under-resourced language. Google App Engine is employed as the core environment for data verification, storage and distribution, and used in conjunction with existing tools for gathering text data and for speech data recording. We analyse the data acquired by each of the tools and develop an ASR system in Shona, an important under-resourced language of Southern Africa. Although unexpected logistical problems complicated the process, we were able to collect a useable Shona speech corpus, and develop the first Automatic Speech Recognition system in that language. / MIng (Computer and Electronic Engineering), North-West University, Potchefstroom Campus, 2014
113

Auf dem Weg zu einem TEI-Austauschformat für ägyptisch-koptische Texte

Gerhards, Simone, Schweitzer, Simon 20 April 2016 (has links) (PDF)
Diverse ägyptologische Großprojekte (TLA: http://aaew.bbaw.de/tla; Ramses: http://ramses.ulg.ac.be/; Rubensohn: http://elephantine.smb.museum/; Karnak: http://www.cfeetk.cnrs.fr/karnak/) erstellen annotierte Korpora. Für einen Datenaustausch ist ein standardisiertes Austauschformat, das auf TEI beruht, dringend erforderlich. Dazu haben sich diese Großprojekte zusammengeschlossen, um einen gemeinsamen Vorschlag zu erarbeiten. In unserem Vortrag möchten wir den aktuellen Stand der Diskussion präsentieren: Was ist der Basistext in der Auszeichnung: hieroglyphische Annotation oder die Umschrift des Textes? Wie geht man mit den verschiedenen Schriftformaten um? Können die Metadatenangaben im Header mithilfe gemeinsamer Thesauri standardisiert werden? Was wird inline, was wird stand-off annotiert?
114

SECOND LANGUAGE ACQUISITION OF SPATIAL METAPHORS IN ENGLISH AND CHINESE WRITINGS: INSIGHTS FROM NATIVE AND LEARNER LANGUAGE CORPORA

Jin, Lingxia January 2011 (has links)
First outlined by Lakoff and Johnson (1980), Conceptual Metaphor Theory (CMT) continues to thrive (e.g. Lakoff&Johnson 1992, Lakoff, 1993, 1999, 2008), by first challenging the traditional view on metaphor as a matter of language and something extraordinary and poetic. CMT claims that metaphor is pervasive and essential in language and thought. Furthermore, metaphor is considered as the locus for abstract reasoning in this theory.Since its proposal, CMT has triggered plethoric research. However, few empirical studies have examined metaphors in second language (L2) acquisition and the importance of metaphor has not been fully recognized as an indispensable dimension in second language teaching and learning (Littlemore, 2009; Littlemore&Low, 2006b). However, metaphors present a hurdle for L2 learners (Danesi, 1992); L2 learners misinterpret metaphors for cultural reasons (Littlemore, 2003); teaching conceptual metaphor as a learning strategy facilitate language learning (Littlemore&Low, 2006a; Li, 2009).Thus, the current study investigates metaphor in learner language in light of CMT via a corpus-based approach. The study particularly examines how L2 learners of Chinese and English express vertical spatial metaphors in L2 English and L2 Chinese writings and how they differ from learners' target languages and learners' native languages.The findings reveal that L2 language development is a dynamic process and four key factors are found to interplay in learners' acquisition of conceptual metaphors: frequency of the metaphor, L2 proficiency, topic familiarity, and linguistic factors. In particular, the frequency of the metaphor as reflected in the target language has the most important impact on learners' acquisition of conceptual metaphors, overriding the factor whether a metaphor is shared in L1 and L2 or not; secondly, L2 proficiency influences how learners are affected by their first languages: learners with lower proficiency are more affected; thirdly, learners acquire the metaphors associated with a familiar topic; finally, L2 learners are constrained by the main semantic unit in the metaphorical expressions. Overall, the study demonstrates that figurative language development is a dynamic process: learners' metaphoric competence demonstrates a developmental pattern, in particular, a pendulum effect and it eventually emerges over L2 proficiency.
115

Etude de la paraphrase sous-phrastique en traitement automatique des langues / A study of sub-sentential paraphrases in Natural Language Processing

Bouamor, Houda 11 June 2012 (has links)
La variabilité en langue est une source majeure de difficultés dans la plupart des applications du traitement automatique des langues. Elle se manifeste dans le fait qu’une même idée ou un même événement peut être exprimé avec des mots ou des groupes de mots différents ayant la même signification dans leur contexte respectif. Capturer automatiquement des équivalences sémantiques entre des unités de texte est une tâche complexe mais qui s’avère indispensable dans de nombreux contextes. L’acquisition a priori de listes d’équivalences met à disposition des ressources utiles pour, par exemple, améliorer le repérage d’une réponse à une question, autoriser des formulations différentes en évaluation de la traduction automatique, ou encore aider des auteurs à trouver des formulations plus adaptées. Dans cette thèse, nous proposons une étude détaillée de la tâche d’acquisition de paraphrases sous-phrastiques à partir de paires d’énoncés sémantiquement liés. Nous démontrons empiriquement que les corpus parallèles monolingues, bien qu’extrêmement rares, constituent le type de ressource le plus adapté pour ce genre d’étude. Nos expériences mettent en jeu cinq techniques d’acquisition, représentatives de différentes approches et connaissances, en anglais et en français. Afin d’améliorer la performance en acquisition, nous réalisons la combinaison des paraphrases produites par ces techniques par une validation reposant sur un classifieur automatique à maximum d’entropie bi-classe. Un résultat important de notre étude est l’identification de paraphrases qui défient actuellement les techniques étudiées, lesquelles sont classées et quantifiées en anglais et français. Nous examinons également dans cette thèse l’impact de la langue, du type du corpus et la comparabilité des paires des énoncés utilisés sur la tâche d’acquisition de paraphrases sous- phrastiques. Nous présentons le résultat d’une analyse de la performance des différentes méthodes testées en fonction des difficultés d’alignement des paires de paraphrases d’énoncés. Nous donnons, ensuite, un compte rendu descriptif et quantitatif des caractéristiques des paraphrases trouvées dans les différents types de corpus étudiés ainsi que celles qui défient les approches actuelles d’identification automatique. / Language variation, or the fact that messages can be conveyed in a great variety of ways by means of linguistic expressions, is one of the most challenging and certainly fascinating features of language for Natural Language Processing, with wide applications in language analysis and generation. The term paraphrase is now commonly used to refer to textual units of equivalent meaning, down to the level of sub-sentential fragments. Although one can envisage to manually build high-coverage lists of synonyms, enumerating meaning equivalences at the level of phrases is too daunting a task for humans. Consequently, acquiring this type of knowledge by automatic means has attracted a lot of attention and significant research efforts have been devoted to this objective. In this thesis we use parallel monolingual corpora for a detailed study of the task of sub-sentential paraphrase acquisition. We argue that the scarcity of this type of resource is compensated by the fact that it is the most suited corpus type for studies on paraphrasing. We propose a large exploration of this task with experiments on two languages with five different acquisition techniques, selected for their complementarity, their combinations, as well as four monolingual corpus types of varying comparability. We report, under all conditions, a significant improvement over all techniques by validating candidate paraphrases using a maximum entropy classifier. An important result of our study is the identification of difficult-to-acquire paraphrase pairs, which are classified and quantified in a bilingual typology.
116

Usage des variables phonologiques dans un corpus d’interactions naturelles parents-enfant : impact du bain linguistique et dispositifs cognitifs d’apprentissage / Phonological variables usage in a corpus of parents-child interaction : cognitive devices of learning and impact of language exposure

Liegeois, Loic 07 November 2014 (has links)
Cette recherche s’intéresse à l’usage de deux variables du français traditionnellement décrites comme phonologiques : la liaison et l’élision du schwa. Ces variables sont étudiées au cours d’interactions naturelles entre trois enfants et leurs parents respectifs. Plus précisément, l’objectif de cette thèse est de décrire les particularités du discours adressé à l’enfant (DAE) au niveau de l’usage des variables phonologiques et de mesurer leur impact sur l’émergence de la production de ces mêmes variables chez l’enfant. Après la présentation du cadre théorique d’analyse et de la méthodologie de recueil, de structuration et d’analyse des données, le travail de recherche s’organise en trois parties. La première étude basée sur corpus, descriptive, a deux principaux objectifs. Dans un premier temps, il s’agit de mesurer à quelle variation les jeunes enfants sont exposés au domicile familial. Ensuite, le but est de confronter les résultats des études précédentes sur l’acquisition de la liaison, principalement obtenus à partir de tâches expérimentales, à des données issues de corpus denses d’interactions parent-enfant. Cette étude a notamment permis de relever l’influence de facteurs liés à l’usage, comme la fréquence, sur l’emploi des variables phonologiques. La seconde étude se focalise sur les caractéristiques du DAE. Les résultats présentés démontrent notamment que l’usage des variables phonologiques est modulé en DAE, et ce essentiellement à un stade précoce. Cette modulation s’atténue ensuite au cours du développement linguistique des jeunes sujets. La dernière étude de ce travail de recherche permet de mettre en relation les productions enfantines et parentales. Il apparaît que le développement de la variation phonologique va dans le sens des hypothèses émises par les modèles basés sur l’usage : la variation phonologique est à un stade précoce mémorisée à l’intérieur de constructions spécifiques, particulièrement fréquentes et saillantes dans le DAE. Celles-ci vont ensuite s’abstraire et entrer en concurrence au cours du développement, ces deux phénomènes étant particulièrement sensibles aux facteurs d’usage, notamment la fréquence d’emploi des types et des formes linguistiques. / This study deals with the usage of two French linguistic variables liaison and elision, which are traditionally described as phonological variables. They are studied during natural interactions between three children and their parents. More precisely, the aim of this thesis is to describe the specificities of the child directed speech (CDS) concerning the usage of liaison and elision to measure their impact on the emergence of these phonological variables in the speech of the children. After the presentation of the theoretical context of the study (Usage-Based Models and Construction Grammar) and the methodology used to collect, structure, and analyse the data, the research is divided into three analysis sections. The aim of the first corpus based study, a descriptive one, is twofold. The first objective is to describe the variation to which children are exposed at home. A second objective is to compare the results of previous studies on liaison acquisition, obtained mainly from experimental tasks, with data extracted from dense corpora collected during natural interactions between the children and their parents. In particular, this study shows that usage factors, including the frequency of items, influence the production of phonological variables. The second study focuses on the specificities of CDS. The results show that the usage of phonological variables is modulated in CDS, essentially at an early stage of language acquisition. Then, this modulation attenuates during the child’s development. The aim of the third study is to connect parent’s productions and children’s productions. It appears that the results concerning the development of phonological variation are in step with the assumptions provided by the usage-based models: at an early stage, the variation is memorized into specific constructions, particularly salient and frequent in CDS. Then, these constructions are abstracted and enter into competition with each other during the course of language development. The children’s productions show that these two phenomena are especially sensitive to usage factors, including type and token frequency.
117

A corpus-assisted research on translator style :Eileen Chang as a self-translator

Chen, Feng De January 2018 (has links)
University of Macau / Faculty of Arts and Humanities. / Department of English
118

A corpora??o como inst?ncia sociopol?tica antecipadora do Estado na Filosofia do Direito de Hegel

Ximenes, Jo?o de Ara?jo 26 August 2010 (has links)
Made available in DSpace on 2015-04-14T13:55:05Z (GMT). No. of bitstreams: 1 426432.pdf: 729012 bytes, checksum: e652931158d108afb8662fc929701b37 (MD5) Previous issue date: 2010-08-26 / Esta disserta??o aborda o conceito de Corpora??o, na obra Filosofia do Direito publicada por Hegel em 1820/21, com o objetivo de lan?ar luz sobre esta tem?tica. Pois, a Corpora??o ? considerada uma institui??o mediadora, inserida na Sociedade Civil-Burguesa, cuja import?ncia se deve, principalmente, pelas suas caracter?sticas de institui??o social e pol?tica. Esta dupla caracter?stica motivou a seguinte pergunta: Como as Corpora??es, consideradas como uma inst?ncia da Sociedade Civil-Burguesa, tratadas por Hegel na Filosofia do Direito, efetuam a sociabilidade que tem a for?a de formar a interdepend?ncia e a integra??o dos indiv?duos ? Com o intuito de oferecer uma resposta, essa disserta??o foi escrita em tr?s cap?tulos: 1) A institui??o da liberdade na Filosofia do Direito, no qual se buscou estabelecer uma conex?o entre a Corpora??o e o conceito central da obra: a liberdade; 2) Media??o das Corpora??es na Sociedade Civil-Burguesa, no qual se buscou mostrar os principais elementos que comp?em a Corpora??o, enquanto institui??o; e, finalmente, 3) A Corpora??o entre a juridifica??o e o reconhecimento, no qual se buscou estabelecer uma leitura hermen?utica atual da obra e desse conceito.
119

The effects of lexical input on L2 writing: a corpus-informed approach.

January 2010 (has links)
Huang, Zeping. / Thesis (M.Phil.)--Chinese University of Hong Kong, 2010. / Includes bibliographical references (leaves 121-132). / Abstracts in English and Chinese; appendix two in English and Chinese. / Acknowledgements --- p.i / Abstract --- p.iii / Abstract (Chinese) --- p.v / Table Of Contents --- p.vi / List of Tables --- p.ix / List of Figures and Graphs --- p.x / Chapter CHAPTER ONE --- INTRODUCTION --- p.1 / Chapter 1.1. --- Motivation --- p.1 / Chapter 1.1.2. --- The importance of language use in L2 writing --- p.1 / Chapter 1.1.2. --- The possibilities of integrating corpora into L2 writing instruction --- p.2 / Chapter 1.1.3. --- The need for corpus-informed approach --- p.2 / Chapter 1.2. --- Purpose of this study --- p.4 / Chapter 1.3. --- Research questions --- p.5 / Chapter 1.4. --- Overall research methods --- p.5 / Chapter 1.5. --- Significance of the study --- p.5 / Chapter 1.6. --- Organization of the thesis --- p.6 / Chapter CHAPTER TWO --- LITERATURE REVIEW --- p.8 / Chapter 2.1. --- "Research on corpora and L2 writing...," --- p.8 / Chapter 2.1.1. --- Studies on corpus use from teachers' perspective --- p.9 / Chapter 2.1.2. --- Studies on students' direct use of corpus --- p.10 / Chapter 2.1.3. --- Empirical Studies on corpus and vocabulary learning --- p.17 / Chapter 2.2. --- Evaluations of the studies under review --- p.19 / Chapter 2.2.1. --- Evaluation of research on corpus-informed teaching materials development --- p.19 / Chapter 2.2.2. --- Evaluations of empirical research on students' direct use of corpus --- p.21 / Chapter 2.3. --- Call for further studies --- p.22 / Chapter CHAPTER THREE --- METHODOLOGY --- p.24 / Chapter 3.1. --- Participants --- p.24 / Chapter 3.2. --- Research Setting --- p.25 / Chapter 3.3. --- Materials --- p.25 / Chapter 3.3.1. --- Corpora used --- p.26 / Chapter 3.3.2. --- Selecting the target words --- p.28 / Chapter 3.3.3. --- Sifting the concordance lines --- p.29 / Chapter 3.3.4. --- Formulating the queries --- p.30 / Chapter 3.4. --- Research design --- p.32 / Chapter 3.4.1. --- Pretest --- p.33 / Chapter 3.4.2. --- Immediate posttest --- p.33 / Chapter 3.4.3. --- Delayed posttest --- p.34 / Chapter 3.5. --- Procedures --- p.35 / Chapter 3.6. --- Instruments --- p.37 / Chapter 3.6.1. --- Questionnaires --- p.37 / Chapter 3.6.2. --- Learning journals --- p.38 / Chapter 3.6.3. --- Uptake sheets --- p.38 / Chapter 3.7. --- Data collection and analysis --- p.38 / Chapter 3.7.1. --- Holistic scoring --- p.39 / Chapter 3.7.2. --- Analysis of the use of target words --- p.40 / Chapter 3.7.3. --- Questionnaire responses --- p.42 / Chapter 3.8. --- Chapter summary --- p.43 / Chapter CHAPTER FOUR --- RESULTS --- p.44 / Chapter 4.1. --- Holistic scores --- p.44 / Chapter 4.2. --- Use of signaling nouns (SNs) --- p.46 / Chapter 4.2.1. --- Accuracy --- p.46 / Chapter 4.2.2. --- Complexity --- p.61 / Chapter 4.2.3. --- Retention of the target patterns --- p.73 / Chapter 4.3. --- Content Schemata nouns --- p.74 / Chapter 4.4. --- Evaluation of the concordance exercises --- p.75 / Chapter 4.4.1. --- Effects on vocabulary learning --- p.75 / Chapter 4.4.2. --- Effect on L2 writing --- p.78 / Chapter 4.4.3. --- Difficulties in doing the concordance exercises --- p.80 / Chapter 4.5. --- Chapter summary --- p.84 / Chapter CHAPTER FIVE --- DISCUSSION --- p.85 / Chapter 5.1. --- Did the corpus-informed approach improve students' overall writing quality? --- p.85 / Chapter 5.1.1. --- Cut-off sentences --- p.87 / Chapter 5.1.2. --- Culture-loaded information in concordance lines --- p.88 / Chapter 5.2. --- Did the corpus-informed approach improve vocabulary use in students' writing? --- p.90 / Chapter 5.2.1. --- Interface of lexis and syntax --- p.91 / Chapter 5.2.2. --- Encouraging usage-based learning --- p.95 / Chapter 5.2.3. --- Raising learner awareness of collocation and colligation --- p.97 / Chapter 5.2.4. --- Retention of lexico-grammatical patterns --- p.98 / Chapter 5.3. --- Did students think that corpus-informed approach helped their writing? --- p.100 / Chapter 5.4. --- Towards a tentative model of corpus-informed writing instruction --- p.102 / Chapter 5.4.1. --- Preparing Materials --- p.104 / Chapter 5.4.2. --- During the exploration of a topic-specific corpus --- p.105 / Chapter 5.4.3. --- Follow-up activities after exploration of the topic-specific corpus --- p.106 / Chapter 5.5. --- Chapter summary --- p.108 / Chapter CHAPTER SIX --- CONCLUSION --- p.109 / Chapter 6.1. --- Summary of this study --- p.109 / Chapter 6.1.1. --- Enhancement of lexico-grammatical patterns --- p.109 / Chapter 6.1.2. --- Enhanced awareness of the importance of collocations --- p.111 / Chapter 6.1.3. --- Pivotal role of prior grammatical knowledge in corpus-informed learning --- p.111 / Chapter 6.1.4. --- Insignificant correlation between learning CSNs and ideas development --- p.113 / Chapter 6.2. --- Pedagogical implications --- p.113 / Chapter 6.2.1. --- Writing materials development --- p.114 / Chapter 6.2.2. --- Implementation of corpus-informed activities --- p.115 / Chapter 6.3. --- Limitations and suggestions --- p.117 / Chapter 6.3.1. --- A longer experimental time frame --- p.117 / Chapter 6.3.2. --- More lexical input --- p.118 / Chapter 6.3.3. --- More comparison groups --- p.118 / Chapter 6.3.4. --- Different proficiency levels --- p.119 / Chapter 6.3.5. --- Web-based concordances and more follow-up learning activities --- p.119 / Chapter 6.3.6. --- Case studies --- p.120 / Chapter 6.4. --- Closing remarks --- p.120 / Bibliography --- p.121 / Appendix One Questionnaire One --- p.133 / Appendix Two Questionnaire Two --- p.136 / Appendix Three Learning Journal --- p.139 / Appendix Four Pre-writing Vocabulary Study --- p.140 / Appendix Five Pretest Writing Task --- p.153 / Appendix Six Immediate Posttest Writing Task --- p.154 / Appendix Seven Delayed Posttest Writing Task --- p.155
120

Evaluation of two word alignment systems

Wang, Xiaoyang January 2004 (has links)
<p>This project evaluates two different systems that generate wordalignments on English-Swedish data. The systems to be used are the Giza++ system, that may generate a variety of statistical translation models, and I*Trix system developed at IDA/NLPLab that generates word pairs with frequencies. </p><p>The file formats of these two systems, the way of running them and the differences of the two systems are addressed in this paper. Evaluation in this project considers a variety of parameters such as corpus size, characteristics of the corpus, the effect of linguistic knowledge, etc. At the end of this paper, the conclusions of the two systems evaluation are presented. In general, Giza++ is better applying on big corpora while I*Trix is better for small corpora. Especially for corpora with high statistical ratio or special resource, I*Trix has a better performance.</p>

Page generated in 0.0403 seconds