Global ETD Search

121	Evaluation of two word alignment systems Wang, Xiaoyang January 2004 (has links) <p>This project evaluates two different systems that generate wordalignments on English-Swedish data. The systems to be used are the Giza++ system, that may generate a variety of statistical translation models, and ITrix system developed at IDA/NLPLab that generates word pairs with frequencies. </p><p>The file formats of these two systems, the way of running them and the differences of the two systems are addressed in this paper. Evaluation in this project considers a variety of parameters such as corpus size, characteristics of the corpus, the effect of linguistic knowledge, etc. At the end of this paper, the conclusions of the two systems evaluation are presented. In general, Giza++ is better applying on big corpora while ITrix is better for small corpora. Especially for corpora with high statistical ratio or special resource, ITrix has a better performance.</p> Datalogi Word alignment Giza++ ITrix Parallel corpora Statistical ratio Evaluation I*Eval Gold standard. Datalogi Computer science Datalogi
122	Inhibition of farnesoic acid methyltransferase by sinefungin Ferenz, Hans-Jürgen, Peter, Martin G., Berg, Dieter January 1983 (has links) Sinefungin inhibited the S-adenosylmethionine-dependent farnesoic acid methyltransferase in a cell-free system containing a homogenate of corpora allata from female locusts, Locusta migratoria. The enzyme catalyzed the penultimate step of juvenile hormone biosynthesis in the insects. Culturing corpora allata in the presence of sinefungin greatly suppressed juvenile hormone production. The following in vivo effects were visible after injection of the inhibitor: increase in mortality and reduction of total haemolymph protein liter and ovary fresh weight, as well as length of terminal oocytes. Attempts to reverse these effects by topical application of the juvenile hormone analog ZR-515 (methoprene) were only partly successful. Therefore, the in vivo effects may be due to a general inhibition of methyltransferase enzymes in the insect. Sinefungin appeared to be of potential interest as the first representative of a new class of insect growth regulators. Juvenile hormone analogue Orthoptera Juvenile hormone Biosynthesis Enzyme Corpora allata In vitro Biological activity Enzyme inhibitor Chemistry and allied sciences
123	A corpus-based analysis of tense usage in Cantonese-English bilingual children Chan, Chin-ying, Alice., 陳展瑩. January 2010 (has links) published_or_final_version / Linguistics / Master / Master of Arts Corpora (Linguistics) Bilingualism - China - Hong Kong.
124	A corpus-linguistic approach to foreign/second language learning: an experimental study of a new pedagogicmodel for integrating linguistic knowledge with corpus technology Jones, Warwick Alfred. January 2011 (has links) published_or_final_version / Linguistics / Master / Master of Arts Corpora (Linguistics) Computational linguistics.
125	Studies in Corpora and Idioms : Getting the cat out of the bag Minugh, David January 2014 (has links) “Idiomatic” expressions, usually called “idioms”, such as a dime a dozen, a busman’s holiday, or to have bats in your belfry are a curious part of any language: they usually have a fixed lexical (why a busman?) and structural composition (only dime and dozen in direct conjunction mean ‘common, ordinary’), can be semantically obscure (why bats?), yet are widely recognized in the speech community, in spite of being so rare that only large corpora can provide us with access to sufficient empirical data on their use. In this compilation thesis, four published studies focusing on idioms in corpora are presented. Study 1 details the creation of and data in the author’s medium-sized corpus from 1999, the 3.7 million word Coll corpus of online university student newspapers, with comparisons to data from standard corpora of the time. Study 2 examines the extent to which recognized idioms are to be found in the Coll corpus and how they can be varied. Study 3 draws upon the British National Corpus and a series of British and American newspaper corpora to see how idioms may be “anchored” in their contexts, primarily by the device of premodification via an adjective appropriate to the context, not to the idiom. Study 4 examines idiom-usage patterns in the Time Magazine corpus, focusing on possible aspects of diachronic change over the near-century Time represents. The introductory compilation chapter places and discusses these studies in their contexts of contemporary idiom and corpus research; building on these studies, it provides two specific examples of potential ways forward in idiom research: an examination of the idioms used in a specific subgenre of newspapers (editorials), and a detailed suggestion for teachers about how to examine multiple facets of a specific modern idiom (the glass ceiling) in the classroom. Finally, a summing-up includes suggestions for further research, particularly at the level of the patterning of individual idioms, rather than treating them as a homogeneous phenomenon. Coll corpus corpora corpus creation idioms idiom variation idiom-breaking online newspapers student newspapers college newspapers English language Engelska språket
126	Mesurer et améliorer la qualité des corpus comparables / Measuring and Improving Comparable Corpus Quality Li, Bo 26 June 2012 (has links) Les corpus bilingues sont des ressources essentielles pour s'affranchir de la barrière de la langue en traitement automatique des langues (TAL) dans un contexte multilingue. La plupart des travaux actuels utilisent des corpus parallèles qui sont surtout disponibles pour des langues majeurs et pour des domaines spécifiques. Les corpus comparables, qui rassemblent des textes comportant des informations corrélées, sont cependant moins coûteux à obtenir en grande quantité. Plusieurs travaux antérieurs ont montré que l'utilisation des corpus comparables est bénéfique à différentes taches en TAL. En parallèle à ces travaux, nous proposons dans cette thèse d'améliorer la qualité des corpus comparables dans le but d'améliorer les performances des applications qui les exploitent. L'idée est avantageuse puisqu'elle peut être utilisée avec n'importe quelle méthode existante reposant sur des corpus comparables. Nous discuterons en premier la notion de comparabilité inspirée des expériences d'utilisation des corpus bilingues. Cette notion motive plusieurs implémentations de la mesure de comparabilité dans un cadre probabiliste, ainsi qu'une méthodologie pour évaluer la capacité des mesures de comparabilité à capturer un haut niveau de comparabilité. Les mesures de comparabilité sont aussi examinées en termes de robustesse aux changements des entrées du dictionnaire. Les expériences montrent qu'une mesure symétrique s'appuyant sur l'entrelacement du vocabulaire peut être corrélée avec un haut niveau de comparabilité et est robuste aux changements des entrées du dictionnaire. En s'appuyant sur cette mesure de comparabilité, deux méthodes nommées: greedy approach et clustering approach, sont alors développées afin d'améliorer la qualité d'un corpus comparable donnée. L'idée générale de ces deux méthodes est de choisir une sous partie du corpus original qui soit de haute qualité, et d'enrichir la sous-partie de qualité moindre avec des ressources externes. Les expériences montrent que l'on peut améliorer avec ces deux méthodes la qualité en termes de score de comparabilité d'un corpus comparable donnée, avec la méthode clustering approach qui est plus efficace que la method greedy approach. Le corpus comparable ainsi obtenu, permet d'augmenter la qualité des lexiques bilingues en utilisant l'algorithme d'extraction standard. Enfin, nous nous penchons sur la tâche d'extraction d'information interlingue (Cross-Language Information Retrieval, CLIR) et l'application des corpus comparables à cette tâche. Nous développons de nouveaux modèles CLIR en étendant les récents modèles proposés en recherche d'information monolingue. Le modèle CLIR montre de meilleurs performances globales. Les lexiques bilingues extraits à partir des corpus comparables sont alors combinés avec le dictionnaire bilingue existant, est utilisé dans les expériences CLIR, ce qui induit une amélioration significative des systèmes CLIR. / Bilingual corpora are an essential resource used to cross the language barrier in multilingual Natural Language Processing (NLP) tasks. Most of the current work makes use of parallel corpora that are mainly available for major languages and constrained areas. Comparable corpora, text collections comprised of documents covering overlapping information, are however less expensive to obtain in high volume. Previous work has shown that using comparable corpora is beneficent for several NLP tasks. Apart from those studies, we will try in this thesis to improve the quality of comparable corpora so as to improve the performance of applications exploiting them. The idea is advantageous since it can work with any existing method making use of comparable corpora. We first discuss in the thesis the notion of comparability inspired from the usage experience of bilingual corpora. The notion motivates several implementations of the comparability measure under the probabilistic framework, as well as a methodology to evaluate the ability of comparability measures to capture gold-standard comparability levels. The comparability measures are also examined in terms of robustness to dictionary changes. The experiments show that a symmetric measure relying on vocabulary overlapping can correlate very well with gold-standard comparability levels and is robust to dictionary changes. Based on the comparability measure, two methods, namely the greedy approach and the clustering approach, are then developed to improve the quality of any given comparable corpus. The general idea of these two methods is to choose the highquality subpart from the original corpus and to enrich the low-quality subpart with external resources. The experiments show that one can improve the quality, in terms of comparability scores, of the given comparable corpus by these two methods, with the clustering approach being more efficient than the greedy approach. The enhanced comparable corpus further results in better bilingual lexicons extracted with the standard extraction algorithm. Lastly, we investigate the task of Cross-Language Information Retrieval (CLIR) and the application of comparable corpora in CLIR. We develop novel CLIR models extending the recently proposed information-based models in monolingual IR. The information-based CLIR model is shown to give the best performance overall. Bilingual lexicons extracted from comparable corpora are then combined with the existing bilingual dictionary and used in CLIR experiments, which results in significant improvement of the CLIR system. Corpus comparables Comparabilité Lexiques bilingues Recherche d’information interlingue Comparable corpora Comparability Bilingual lexicons Cross-language information retrieval
127	Alinhamento léxico utilizando técnicas híbridas discriminativas e de pós-processamento / Text alignment Schreiner, Paulo January 2010 (has links) O alinhamento léxico automático é uma tarefa essencial para as técnicas de tradução de máquina empíricas modernas. A abordagem gerativa não-supervisionado têm sido substituída recentemente por uma abordagem discriminativa supervisionada que facilite inclusão de conhecimento linguístico de uma diversidade de fontes. Dentro deste contexto, este trabalho descreve uma série alinhadores léxicos discriminativos que incorporam heurísticas de pós-processamento com o objetivo de melhorar o desempenho dos mesmos para expressões multi-palavra, que constituem um dos desafios da área de processamento de linguagens naturais atualmente. A avaliação é realizada utilizando um gold-standard obtido a partir da anotação de um corpus paralelo de legendas de filmes. Os alinhadores propostos apresentam um desempenho superior tanto ao obtido por uma baseline quanto ao obtido por um alinhador gerativo do estado-da-arte (Giza++), tanto no caso geral quanto para as expressões foco do trabalho. / Lexical alignment is an essential task for modern empirical machine translation techniques. The unsupervised generative approach is being replaced by a supervised, discriminative one that considerably facilitates the inclusion of linguistic knowledge from several sources. Given this context, the present work describes a series of discriminative lexical aligners that incorporate post-processing heuristics with the goal of improving the quality of the alignments of multiword expressions, which is one of the major challanges in natural language processing today. The evaluation is conducted using a gold-standard obtained from a movie subtitle parallel corpus. The aligners proposed show an alignment quality that is superior both to our baseline and to a state-of-the-art generative aligner (Giza++), for the general case as well as for the expressions that are the focus of this work. Linguística computacional Processamento : Linguagem natural Natural language processing Lexical alignment Machine learning Parallel corpora Multiword expressions UFRGS
128	Alinhamento léxico utilizando técnicas híbridas discriminativas e de pós-processamento / Text alignment Schreiner, Paulo January 2010 (has links) O alinhamento léxico automático é uma tarefa essencial para as técnicas de tradução de máquina empíricas modernas. A abordagem gerativa não-supervisionado têm sido substituída recentemente por uma abordagem discriminativa supervisionada que facilite inclusão de conhecimento linguístico de uma diversidade de fontes. Dentro deste contexto, este trabalho descreve uma série alinhadores léxicos discriminativos que incorporam heurísticas de pós-processamento com o objetivo de melhorar o desempenho dos mesmos para expressões multi-palavra, que constituem um dos desafios da área de processamento de linguagens naturais atualmente. A avaliação é realizada utilizando um gold-standard obtido a partir da anotação de um corpus paralelo de legendas de filmes. Os alinhadores propostos apresentam um desempenho superior tanto ao obtido por uma baseline quanto ao obtido por um alinhador gerativo do estado-da-arte (Giza++), tanto no caso geral quanto para as expressões foco do trabalho. / Lexical alignment is an essential task for modern empirical machine translation techniques. The unsupervised generative approach is being replaced by a supervised, discriminative one that considerably facilitates the inclusion of linguistic knowledge from several sources. Given this context, the present work describes a series of discriminative lexical aligners that incorporate post-processing heuristics with the goal of improving the quality of the alignments of multiword expressions, which is one of the major challanges in natural language processing today. The evaluation is conducted using a gold-standard obtained from a movie subtitle parallel corpus. The aligners proposed show an alignment quality that is superior both to our baseline and to a state-of-the-art generative aligner (Giza++), for the general case as well as for the expressions that are the focus of this work. Linguística computacional Processamento : Linguagem natural Natural language processing Lexical alignment Machine learning Parallel corpora Multiword expressions UFRGS
129	Alinhamento léxico utilizando técnicas híbridas discriminativas e de pós-processamento / Text alignment Schreiner, Paulo January 2010 (has links) O alinhamento léxico automático é uma tarefa essencial para as técnicas de tradução de máquina empíricas modernas. A abordagem gerativa não-supervisionado têm sido substituída recentemente por uma abordagem discriminativa supervisionada que facilite inclusão de conhecimento linguístico de uma diversidade de fontes. Dentro deste contexto, este trabalho descreve uma série alinhadores léxicos discriminativos que incorporam heurísticas de pós-processamento com o objetivo de melhorar o desempenho dos mesmos para expressões multi-palavra, que constituem um dos desafios da área de processamento de linguagens naturais atualmente. A avaliação é realizada utilizando um gold-standard obtido a partir da anotação de um corpus paralelo de legendas de filmes. Os alinhadores propostos apresentam um desempenho superior tanto ao obtido por uma baseline quanto ao obtido por um alinhador gerativo do estado-da-arte (Giza++), tanto no caso geral quanto para as expressões foco do trabalho. / Lexical alignment is an essential task for modern empirical machine translation techniques. The unsupervised generative approach is being replaced by a supervised, discriminative one that considerably facilitates the inclusion of linguistic knowledge from several sources. Given this context, the present work describes a series of discriminative lexical aligners that incorporate post-processing heuristics with the goal of improving the quality of the alignments of multiword expressions, which is one of the major challanges in natural language processing today. The evaluation is conducted using a gold-standard obtained from a movie subtitle parallel corpus. The aligners proposed show an alignment quality that is superior both to our baseline and to a state-of-the-art generative aligner (Giza++), for the general case as well as for the expressions that are the focus of this work. Linguística computacional Processamento : Linguagem natural Natural language processing Lexical alignment Machine learning Parallel corpora Multiword expressions UFRGS
130	Étude de la dimension intersubjective de la communication et de la construction du sens dans les discussions à visée philosophique en contexte scolaire / A study of communication’s intersubjective dimension and of the construction of meaning in philosophical discussions within an educational context Auriel, Aline 10 December 2016 (has links) Cette thèse a pour objectif de donner une représentation du fonctionnement de la communication dans les discussions à visée philosophique (enregistrées à l’école primaire et au collège). Plus particulièrement, il s’agit d’étudier la dimension intersubjective de la communication et de la construction du sens au sein de ces pratiques ainsi que la façon dont l’interaction se co-construit. Selon nous, la reconnaissance de l’importance de l’interlocuteur dans l’énonciation est indispensable pour saisir la véritable nature de la communication. Ainsi, en partant d’une approche interactionniste et socio-constructiviste, la construction du sens est envisagée comme un processus dynamique et comme une action commune aux locuteur et interlocuteur(s). Les bénéfices des pratiques philosophiques avec les enfants sont reconnus. L’analyse des interactions menée sur corpus montre que la discussion à visée philosophique est un terrain propice à la construction collective du discours, du sens et de la conceptualisation. Nous nous intéressons aux mécanismes de cette construction collective réalisée par les enfants et au rôle de l’animateur dans cette construction. Nous étudions particulièrement les phénomènes de reprise et un phénomène linguistique figurant parmi les processus d’orientation de l’attention permettant de guider l’interprétation et la compréhension des interlocuteurs : la dislocation à gauche du sujet. L’observation des données permet de contribuer à définir les fonctions pragmatiques de ces phénomènes ainsi que les différents buts communicatifs associés à leur emploi par les enfants et par l’animateur lors des discussions à visée philosophique / This thesis investigates meaning construction during philosophical discussions that took place at both a French first and middle school. More precisely, it studies the intersubjective dimension of communication and the construction of meaning in these practices. At the same time, it examines the ways in which the interaction is co-constructed. We believe that recognising the interlocutor’s importance in the utterance act is essential in order to understand the true nature of the communication. Employing interactionist and socio-constructivist approaches, the construction of meaning is considered as a dynamic process and as a joint action between the speaker and interlocutor(s).The benefits of conducting philosophical discussions of this type with children have previously been recognised in the literature. The analysis of interactional corpora shows that philosophical discussion is a favourable environment for the collective construction of speech, meaning and conceptualization. This doctoral study considers the mechanisms of this collective construction conducted by children and the role of the facilitator within this process. In this way, the thesis examines different phenomena like the repetition/reformulation and the left-dislocation of the subject. This linguistic phenomenon forms part of the attention orientation process and guides collective interpretation and understanding. Data analysis allows the thesis to contribute to define the pragmatic functions of the studied phenomena and the different communication purposes associated with their use by children and by the facilitator during philosophical discussions. Corpus oraux Discussions philosophiques Intersubjectivité Focalisation Reprise Construction collective Spoken corpora Philosophical discussions Intersubjectivity Focus Repetition/reformulation Collective construction

Search results