Global ETD Search

1	Flexible representation for genetic programming : lessons from natural language processing Nguyen, Xuan Hoai, Information Technology & Electrical Engineering, Australian Defence Force Academy, UNSW January 2004 (has links) This thesis principally addresses some problems in genetic programming (GP) and grammar-guided genetic programming (GGGP) arising from the lack of operators able to make small and bounded changes on both genotype and phenotype space. It proposes a new and flexible representation for genetic programming, using a state-of-the-art formalism from natural language processing, Tree Adjoining Grammars (TAGs). It demonstrates that the new TAG-based representation possesses two important properties: non-fixed arity and locality. The former facilitates the design of new operators, including some which are bio-inspired, and others able to make small and bounded changes. The latter ensures that bounded changes in genotype space are reflected in bounded changes in phenotype space. With these two properties, the thesis shows how some well-known difficulties in standard GP and GGGP tree-based representations can be solved in the new representation. These difficulties have been previously attributed to the treebased nature of the representations; since TAG representation is also tree-based, it has enabled a more precise delineation of the causes of the difficulties. Building on the new representation, a new grammar guided GP system known as TAG3P has been developed, and shown to be competitive with other GP and GGGP systems. A new schema theorem, explaining the behaviour of TAG3P on syntactically constrained domains, is derived. Finally, the thesis proposes a new method for understanding performance differences between GP representations requiring different ways to bound the search space, eliminating the effects of the bounds through multi-objective approaches. Genetic programming grammar-guided genotype space natural language processing phenotype space tree adjoining grammars (TAGs)
2	Leveraging MWEs in practical TAG parsing : towards the best of the two worlds / Optimisation d'analyse syntaxique basée sur les grammaires d'arbres adjoints grâce à la modélisation d'expression polylexicales et à l'algorithme A Waszczuk, Jakub 26 June 2017 (has links) Dans ce mémoire, nous nous penchons sur les expressions polylexicales (EP) et leurs relations avec l’analyse syntaxique, la tâche qui consiste à déterminer les relations syntaxiques entre les mots dans une phrase donnée. Le défi que posent les EP dans ce contexte, par rapport aux expressions linguistiques régulières, provient de leurs propriétés parfois inattendues qui les rendent difficiles à gérer dans te traitement automatique des langues. Dans nos travaux, nous montrons qu’il est pourtant possible de profiter de ce cette caractéristique des EP afin d’améliorer les résultats d’analyse syntaxique. Notamment, avec les grammaires d’arbres adjoints (TAGs), qui fournissent un cadre naturel et puissant pour la modélisation des EP, ainsi qu’avec des stratégies de recherche basées sur l’algorithme A* , il est possible d’obtenir des gains importants au niveau de la vitesse sans pour autant détériorer la qualité de l’analyse syntaxique. Cela contraste avec des méthodes purement statistiques qui, malgré l’efficacité, ne fournissent pas de solutions satisfaisantes en ce qui concerne les EP. Nous proposons un analyseur syntaxique novateur qui combine les grammaires TAG avec La technique A, axé sur la prédiction des EP, dont les fonctionnalités permettent des applications à grande échelle, facilement extensible au contexte probabiliste. / In this thesis, we focus on multiword expressions (MWEs) and their relationships with syntactic parsing. The latter task consists in retrieving the syntactic relations holding between the words in a given sentence. The challenge of MWEs in this respect is that, in contrast to regular linguistic expressions, they exhibit various irregular properties which make them harder to deal with in natural language processing. In our work, we show that the challenge of the MWE-related irregularities can be turned into an advantage in practical symbolic parsing. Namely, with tree adjoining grammars (TAGs), which provide first-cLass support for MWEs, and A search strategies, considerable speed-up gains can be achieved by promoting MWE-based analyses with virtually no loss in syntactic parsing accuracy. This is in contrast to purely statistical state-of-the-art parsers, which, despite efficiency, provide no satisfactory support for MWEs. We contribute a TAG-A* -MWE-aware parsing architecture with facilities (grammar compression and feature structures) enabling real-world applications, easily extensible to a probabilistic framework. Expressions polytexicales Analyse syntaxique Grammaires d’arbres adjoints Algorithme A* Multiwords expression Syntactic parsing Tree adjoining grammars Algorithm A*
3	電腦輔助句子重組試題編製 / Computer assisted test item generation for sentence reconstruction 黃志斌, Huang, Chih Bin Unknown Date (has links) 本論文提供了一個句子重組試題編製的環境，協助教師編製句子重組試題，同時學生也能夠在此編製環境中練習句子重組試題。句子重組試題即是要求學生把試題給的一組詞彙組合成特定詞序的正確語句之題型，該試題類型可以檢驗學生對於句型和文法的知識。然而試題所給的詞彙集合往往除了可以組合成教師想要學生回答的正確語句之外，也可以組合成其它的合法語句。為了能辨識學生的回答，把這些合法語句以人工方式逐一建置為答案卻對出題教師造成了負擔。我們建構了一個電腦輔助句子重組試題編製的環境來減輕出題教師的負擔。為了讓電腦可以恰當地判斷學生的回答，我們的編製環境限制了試題詞彙集的相對位置，藉此約束學生只能排出教師預設的特定答案。同時在出題教師建置試題答案時，我們的編製環境也試圖提供所有可能的合法詞序之語句，供出題教師參考。但本論文的研究經驗顯示要自動協助出題教師預示所有可能的合法詞序之語句卻是一件艱難的工作，而且這一研究問題與語法學有密切關係。本論文以基礎詞組為主軸，透過合併詞組和史丹佛剖析器的操作建構出英文句子重組試題編製環境，供教師編輯與學生練習。同時，我們在論文中也提報了中文句子重組試題編製環境的初步探討。 / This thesis presents a computer assisted environment for authoring test items for sentence reconstruction. Not only the teacher can author the test items for sentence reconstruction, but also the student can practice the test items in this environment. A test item for sentence reconstruction asks the student to arrange the shuffled words in a correct order, and this type of tests can examine the knowledge of sentence patterns and grammars of the student. However, the rearranged sentence may match with not only the correct sentence that the teacher wants but also other sentences which are legal. But enumerating all possible legal and acceptable answers for judging the answer of the student manually makes the teacher taking a big load. We construct a computer assisted environment for authoring test items for sentence reconstruction to lighten the load of the teacher. For the purpose of judging the answer of the student by a computer easily, the relative locations of the words are restricted so that we can restrict the sentences that the student arranges. When the teacher provides the correct answers, we try to find and return all of the sentences which may be legal for the teacher's consideration. However, our experience shows that it is difficult to find all of the legal sentences for a given set of words, and this problem associates closely with a certain syntactic research work. This thesis depends on basic word groups to construct an environment of test item authoring for English sentence reconstruction by merging word groups and using the Stanford Parser, and report an initial study of an environment of test item preparation for Chinese sentence reconstruction. 電腦輔助語文教學文法學習輔助句子重組練習連結樹句法 computer assisted language learning grammar learning assistance scrambled sentence reconstruction tree-adjoining grammars
4	Génération automatique de phrases pour l'apprentissage des langues / Natural language generation for language learning Perez, Laura Haide 19 April 2013 (has links) Dans ces travaux, nous explorons comment les techniques de Générations Automatiques de Langue Naturelle (GLN) peuvent être utilisées pour aborder la tâche de génération (semi-)automatique de matériel et d'activités dans le contexte de l'apprentissage de langues assisté par ordinateur. En particulier, nous montrons comment un Réalisateur de Surface (RS) basé sur une grammaire peut être exploité pour la création automatique d'exercices de grammaire. Notre réalisateur de surface utilise une grammaire réversible étendue, à savoir SemTAG, qui est une Grammaire d'Arbre Adjoints à Structure de Traits (FB-TAG) couplée avec une sémantique compositionnelle basée sur l'unification. Plus précisément, la grammaire FB-TAG intègre une représentation plate et sous-spécifiée des formules de Logique de Premier Ordre (FOL). Dans la première partie de la thèse, nous étudions la tâche de réalisation de surface à partir de formules sémantiques plates et nous proposons un algorithme de réalisation de surface basé sur la grammaire FB-TAG optimisé, qui supporte la génération de phrases longues étant donné une grammaire et un lexique à large couverture. L'approche suivie pour l'optimisation de la réalisation de surface basée sur FB-TAG à partir de sémantiques plates repose sur le fait qu'une grammaire FB-TAG peut être traduite en une Grammaire d'Arbres Réguliers à Structure de Traits (FB-RTG) décrivant ses arbres de dérivation. Le langage d'arbres de dérivation de la grammaire TAG constitue un langage plus simple que le langage d'arbres dérivés, c'est pourquoi des approches de génération basées sur les arbres de dérivation ont déjà été proposées. Notre approche se distingue des précédentes par le fait que notre encodage FB-RTG prend en compte les structures de traits présentes dans la grammaire FB-TAG originelle, ayant de ce fait des conséquences importantes par rapport à la sur-génération et la préservation de l'interface syntaxe-sémantique. L'algorithme de génération d'arbres de dérivation que nous proposons est un algorithme de type Earley intégrant un ensemble de techniques d'optimisation bien connues: tabulation, partage-compression (sharing-packing) et indexation basée sur la sémantique. Dans la seconde partie de la thèse, nous explorons comment notre réalisateur de surface basé sur SemTAG peut être utilisé pour la génération (semi-)automatique d'exercices de grammaire. Habituellement, les enseignants éditent manuellement les exercices et leurs solutions et les classent au regard de leur degré de difficulté ou du niveau attendu de l'apprenant. Un courant de recherche dans le Traitement Automatique des Langues (TAL) pour l'apprentissage des langues assisté par ordinateur traite de la génération (semi-)automatique d'exercices. Principalement, ces travaux s'appuient sur des textes extraits du Web, utilisent des techniques d'apprentissage automatique et des techniques d'analyse de textes (par exemple, analyse de phrases, POS tagging, etc.). Ces approches confrontent l'apprenant à des phrases qui ont des syntaxes potentiellement complexes et du vocabulaire varié. En revanche, l'approche que nous proposons dans cette thèse aborde la génération (semi-)automatique d'exercices du type rencontré dans les manuels pour l'apprentissage des langues. Il s'agit, en d'autres termes, d'exercices dont la syntaxe et le vocabulaire sont faits sur mesure pour des objectifs pédagogiques et des sujets donnés. Les approches de génération basées sur des grammaires associent les phrases du langage naturel avec une représentation linguistique fine de leur propriété morpho-syntaxiques et de leur sémantique grâce à quoi il est possible de définir un langage de contraintes syntaxiques et morpho-syntaxiques permettant la sélection de phrases souches en accord avec un objectif pédagogique donné. Cette représentation permet en outre d'opérer un post-traitement des phrases sélectionées pour construire des exercices de grammaire / In this work, we explore how Natural Language Generation (NLG) techniques can be used to address the task of (semi-)automatically generating language learning material and activities in Camputer-Assisted Language Learning (CALL). In particular, we show how a grammar-based Surface Realiser (SR) can be usefully exploited for the automatic creation of grammar exercises. Our surface realiser uses a wide-coverage reversible grammar namely SemTAG, which is a Feature-Based Tree Adjoining Grammar (FB-TAG) equipped with a unification-based compositional semantics. More precisely, the FB-TAG grammar integrates a flat and underspecified representation of First Order Logic (FOL) formulae. In the first part of the thesis, we study the task of surface realisation from flat semantic formulae and we propose an optimised FB-TAG-based realisation algorithm that supports the generation of longer sentences given a large scale grammar and lexicon. The approach followed to optimise TAG-based surface realisation from flat semantics draws on the fact that an FB-TAG can be translated into a Feature-Based Regular Tree Grammar (FB-RTG) describing its derivation trees. The derivation tree language of TAG constitutes a simpler language than the derived tree language, and thus, generation approaches based on derivation trees have been already proposed. Our approach departs from previous ones in that our FB-RTG encoding accounts for feature structures present in the original FB-TAG having thus important consequences regarding over-generation and preservation of the syntax-semantics interface. The concrete derivation tree generation algorithm that we propose is an Earley-style algorithm integrating a set of well-known optimisation techniques: tabulation, sharing-packing, and semantic-based indexing. In the second part of the thesis, we explore how our SemTAG-based surface realiser can be put to work for the (semi-)automatic generation of grammar exercises. Usually, teachers manually edit exercises and their solutions, and classify them according to the degree of dificulty or expected learner level. A strand of research in (Natural Language Processing (NLP) for CALL addresses the (semi-)automatic generation of exercises. Mostly, this work draws on texts extracted from the Web, use machine learning and text analysis techniques (e.g. parsing, POS tagging, etc.). These approaches expose the learner to sentences that have a potentially complex syntax and diverse vocabulary. In contrast, the approach we propose in this thesis addresses the (semi-)automatic generation of grammar exercises of the type found in grammar textbooks. In other words, it deals with the generation of exercises whose syntax and vocabulary are tailored to specific pedagogical goals and topics. Because the grammar-based generation approach associates natural language sentences with a rich linguistic description, it permits defining a syntactic and morpho-syntactic constraints specification language for the selection of stem sentences in compliance with a given pedagogical goal. Further, it allows for the post processing of the generated stem sentences to build grammar exercise items. We show how Fill-in-the-blank, Shuffle and Reformulation grammar exercises can be automatically produced. The approach has been integrated in the Interactive French Learning Game (I-FLEG) serious game for learning French and has been evaluated both based in the interactions with online players and in collaboration with a language teacher Réalisateur de Surface (RS) Surface Realisation (SR) Surface Realisation Optimisation Natural Language Generation (NLG) 402.85

1

Page generated in 0.1157 seconds