Global ETD Search

1	Apprentissage de grammaires catégorielles : transducteurs d’arbres et clustering pour induction de grammaires catégorielles / Learning categorial grammars Sandillon Rezer, Noémie Fleur 09 December 2013 (has links) De nos jours, il n’est pas rare d’utiliser des logiciels capables d’avoir une conversation, d’interagir avec nous (systèmes questions/réponses pour les SAV, gestion d’interface ou simplement Intelligence Artificielle - IA - de discussion). Ceux-ci doivent comprendre le contexte ou réagir par mot-clefs, mais générer ensuite des réponses cohérentes, aussi bien au niveau du sens de la phrase (sémantique) que de la forme (syntaxe). Si les premières IA se contentaient de phrases toutes faites et réagissaient en fonction de mots-clefs, le processus s’est complexifié avec le temps. Pour améliorer celui-ci, il faut comprendre et étudier la construction des phrases. Nous nous focalisons sur la syntaxe et sa modélisation avec des grammaires catégorielles. L’idée est de pouvoir aussi bien générer des squelettes de phrases syntaxiquement correctes que vérifier l’appartenance d’une phrase à un langage, ici le français (il manque l’aspect sémantique). On note que les grammaires AB peuvent, à l’exception de certains phénomènes comme la quantification et l’extraction, servir de base pour la sémantique en extrayant des λ-termes. Nous couvrons aussi bien l’aspect d’extraction de grammaire à partir de corpus arborés que l’analyse de phrases. Pour ce faire, nous présentons deux méthodes d’extraction et une méthode d’analyse de phrases permettant de tester nos grammaires. La première méthode consiste en la création d’un transducteur d’arbres généralisé, qui transforme les arbres syntaxiques en arbres de dérivation d’une grammaire AB. Appliqué sur les corpus français que nous avons à notre disposition, il permet d’avoir une grammaire assez complète de la langue française, ainsi qu’un vaste lexique. Le transducteur, même s’il s’éloigne peu de la définition usuelle d’un transducteur descendant, a pour particularité d’offrir une nouvelle méthode d’écriture des règles de transduction, permettant une définition compacte de celles-ci. Nous transformons actuellement 92,5% des corpus en arbres de dérivation. Pour notre seconde méthode, nous utilisons un algorithme d’unification en guidant celui-ci avec une étape préliminaire de clustering, qui rassemble les mots en fonction de leur contexte dans la phrase. La comparaison avec les arbres extraits du transducteur donne des résultats encourageants avec 91,3% de similarité. Enfin, nous mettons en place une version probabiliste de l’algorithme CYK pour tester l’efficacité de nos grammaires en analyse de phrases. La couverture obtenue est entre 84,6% et 92,6%, en fonction de l’ensemble de phrases pris en entrée. Les probabilités, appliquées aussi bien sur le type des mots lorsque ceux-ci en ont plusieurs que sur les règles, permettent de sélectionner uniquement le “meilleur” arbre de dérivation.Tous nos logiciels sont disponibles au téléchargement sous licence GNU GPL. / Nowadays, we have become familiar with software interacting with us using natural language (for example in question-answering systems for after-sale services, human-computer interaction or simple discussion bots). These tools have to either react by keyword extraction or, more ambitiously, try to understand the sentence in its context. Though the simplest of these programs only have a set of pre-programmed sentences to react to recognized keywords (these systems include Eliza but also more modern systems like Siri), more sophisticated systems make an effort to understand the structure and the meaning of sentences (these include systems like Watson), allowing them to generate consistent answers, both with respect to the meaning of the sentence (semantics) and with respect to its form (syntax). In this thesis, we focus on syntax and on how to model syntax using categorial grammars. Our goal is to generate syntactically accurate sentences (without the semantic aspect) and to verify that a given sentence belongs to a language - the French language. We note that AB grammars, with the exception of some phenomena like quantification or extraction, are also a good basis for semantic purposes. We cover both grammar extraction from treebanks and parsing using the extracted grammars. On this purpose, we present two extraction methods and test the resulting grammars using standard parsing algorithms. The first method focuses on creating a generalized tree transducer, which transforms syntactic trees into derivation trees corresponding to an AB grammar. Applied on the various French treebanks, the transducer’s output gives us a wide-coverage lexicon and a grammar suitable for parsing. The transducer, even if it differs only slightly from the usual definition of a top-down transducer, offers several new, compact ways to express transduction rules. We currently transduce 92.5% of all sen- tences in the treebanks into derivation trees.For our second method, we use a unification algorithm, guiding it with a preliminary clustering step, which gathers the words according to their context in the sentence. The comparision between the transduced trees and this method gives the promising result of 91.3% of similarity.Finally, we have tested our grammars on sentence analysis with a probabilistic CYK algorithm and a formula assignment step done with a supertagger. The obtained coverage lies between 84.6% and 92.6%, depending on the input corpus. The probabilities, estimated for the type of words and for the rules, enable us to select only the “best” derivation tree. All our software is available for download under GNU GPL licence. Grammaires AB Inférence grammaticale Clustering Transducteur d’arbres Apprentissage automatique AB grammar Grammatical inference Clustering Tree transducer Automatic learning
2	Multioperator Weighted Monadic Datalog Stüber, Torsten 06 May 2011 (has links) (PDF) In this thesis we will introduce multioperator weighted monadic datalog (mwmd), a formal model for specifying tree series, tree transformations, and tree languages. This model combines aspects of multioperator weighted tree automata (wmta), weighted monadic datalog (wmd), and monadic datalog tree transducers (mdtt). In order to develop a rich theory we will define multiple versions of semantics for mwmd and compare their expressiveness. We will study normal forms and decidability results of mwmd and show (by employing particular semantic domains) that the theory of mwmd subsumes the theory of both wmd and mdtt. We conclude this thesis by showing that mwmd even contain wmta as a syntactic subclass and present results concerning this subclass. Automatentheorie Baumautomaten Baumübersetzer Logik Monadic Datalog Automata Theory Tree Automata Tree Transducer Logic Monadic Datalog ddc:004 rvk:ST 136
3	Une théorie mécanisée des arbres réguliers en théorie des types dépendants / A mechanized theory of regular trees in dependent type theory Spadotti, Régis 19 May 2016 (has links) Nous proposons deux caractérisations des arbres réguliers. La première est sémantique et s'appuie sur les types co-inductifs. La seconde est syntaxique et repose sur une représentation des arbres réguliers par des termes cycliques. Nous prouvons que ces deux caractérisations sont isomorphes.Ensuite, nous étudions le problème de la définition de morphisme d'arbres préservant la propriété de régularité. Nous montrons en utilisant le formalisme des transducteurs d'arbres, l'existence d'un critère syntaxique garantissant la préservation de cette propriété. Enfin, nous considérons des applications de la théorie des arbres réguliers comme la définition de l'opérateur de composition parallèle d'une algèbre de processus ou encore, les problèmes de décidabilité sur les arbres réguliers via une mécanisation d'un vérificateur de modèles pour un mu-calcul coalgébrique. Tous les résultats ont été mécanisés et prouvés corrects dans l'assistant de preuve Coq. / We propose two characterizations of regular trees. The first one is semantic and is based on coinductive types. The second one is syntactic and represents regular trees by means of cyclic terms. We prove that both of these characterizations are isomorphic. Then, we study the problem of defining tree morphisms preserving the regularity property. We show, by using the formalism of tree transducers, the existence of syntactic criterion ensuring that this property is preserved. Finally, we consider applications of the theory of regular trees such as the definition of the parallel composition operator of a process algebra or, the decidability problems on regular trees through a mechanization of a model-checker for a coalgebraic mu-calculus. All the results were mechanized and proved correct in the Coq proof assistant. Assistant de preuve Arbre régulier Terme cyclique Co-induction Transducteur d'arbres Proof assistant Regular tree Cyclic term Coinduction Tree transducer
4	Multioperator Weighted Monadic Datalog Stüber, Torsten 10 February 2011 (has links) In this thesis we will introduce multioperator weighted monadic datalog (mwmd), a formal model for specifying tree series, tree transformations, and tree languages. This model combines aspects of multioperator weighted tree automata (wmta), weighted monadic datalog (wmd), and monadic datalog tree transducers (mdtt). In order to develop a rich theory we will define multiple versions of semantics for mwmd and compare their expressiveness. We will study normal forms and decidability results of mwmd and show (by employing particular semantic domains) that the theory of mwmd subsumes the theory of both wmd and mdtt. We conclude this thesis by showing that mwmd even contain wmta as a syntactic subclass and present results concerning this subclass. info:eu-repo/classification/ddc/004 ddc:004
5	Probabilistic tree transducers for grammatical error correction Buys, Jan Moolman 12 1900 (has links) Thesis (MSc)--Stellenbosch University, 2013. / ENGLISH ABSTRACT: We investigate the application of weighted tree transducers to correcting grammatical errors in natural language. Weighted finite-state transducers (FST) have been used successfully in a wide range of natural language processing (NLP) tasks, even though the expressiveness of the linguistic transformations they perform is limited. Recently, there has been an increase in the use of weighted tree transducers and related formalisms that can express syntax-based natural language transformations in a probabilistic setting. The NLP task that we investigate is the automatic correction of grammar errors made by English language learners. In contrast to spelling correction, which can be performed with a very high accuracy, the performance of grammar correction systems is still low for most error types. Commercial grammar correction systems mostly use rule-based methods. The most common approach in recent grammatical error correction research is to use statistical classifiers that make local decisions about the occurrence of specific error types. The approach that we investigate is related to a number of other approaches inspired by statistical machine translation (SMT) or based on language modelling. Corpora of language learner writing annotated with error corrections are used as training data. Our baseline model is a noisy-channel FST model consisting of an n-gram language model and a FST error model, which performs word insertion, deletion and replacement operations. The tree transducer model we use to perform error correction is a weighted top-down tree-to-string transducer, formulated to perform transformations between parse trees of correct sentences and incorrect sentences. Using an algorithm developed for syntax-based SMT, transducer rules are extracted from training data of which the correct version of sentences have been parsed. Rule weights are also estimated from the training data. Hypothesis sentences generated by the tree transducer are reranked using an n-gram language model. We perform experiments to evaluate the performance of different configurations of the proposed models. In our implementation an existing tree transducer toolkit is used. To make decoding time feasible sentences are split into clauses and heuristic pruning is performed during decoding. We consider different modelling choices in the construction of transducer rules. The evaluation of our models is based on precision and recall. Experiments are performed to correct various error types on two learner corpora. The results show that our system is competitive with existing approaches on several error types. / AFRIKAANSE OPSOMMING: Ons ondersoek die toepassing van geweegde boomoutomate om grammatikafoute in natuurlike taal outomaties reg te stel. Geweegde eindigetoestand outomate word suksesvol gebruik in ’n wye omvang van take in natuurlike taalverwerking, alhoewel die uitdrukkingskrag van die taalkundige transformasies wat hulle uitvoer beperk is. Daar is die afgelope tyd ’n toename in die gebruik van geweegde boomoutomate en verwante formalismes wat sintaktiese transformasies in natuurlike taal in ’n probabilistiese raamwerk voorstel. Die natuurlike taalverwerkingstoepassing wat ons ondersoek is die outomatiese regstelling van taalfoute wat gemaak word deur Engelse taalleerders. Terwyl speltoetsing in Engels met ’n baie hoë akkuraatheid gedoen kan word, is die prestasie van taalregstellingstelsels nog relatief swak vir meeste fouttipes. Kommersiële taalregstellingstelsels maak oorwegend gebruik van reël-gebaseerde metodes. Die algemeenste benadering in onlangse navorsing oor grammatikale foutkorreksie is om statistiese klassifiseerders wat plaaslike besluite oor die voorkoms van spesifieke fouttipes maak te gebruik. Die benadering wat ons ondersoek is verwant aan ’n aantal ander benaderings wat geïnspireer is deur statistiese masjienvertaling of op taalmodellering gebaseer is. Korpora van taalleerderskryfwerk wat met foutregstellings geannoteer is, word as afrigdata gebruik. Ons kontrolestelsel is ’n geraaskanaal eindigetoestand outomaatmodel wat bestaan uit ’n n-gram taalmodel en ’n foutmodel wat invoegings-, verwyderings- en vervangingsoperasies op woordvlak uitvoer. Die boomoutomaatmodel wat ons gebruik vir grammatikale foutkorreksie is ’n geweegde bo-na-onder boom-na-string omsetteroutomaat geformuleer om transformasies tussen sintaksbome van korrekte sinne en foutiewe sinne te maak. ’n Algoritme wat ontwikkel is vir sintaksgebaseerde statistiese masjienvertaling word gebruik om reëls te onttrek uit die afrigdata, waarvan sintaksontleding op die korrekte weergawe van die sinne gedoen is. Reëlgewigte word ook vanaf die afrigdata beraam. Hipotese-sinne gegenereer deur die boomoutomaat word herrangskik met behulp van ’n n-gram taalmodel. Ons voer eksperimente uit om die doeltreffendheid van verskillende opstellings van die voorgestelde modelle te evalueer. In ons implementering word ’n bestaande boomoutomaat sagtewarepakket gebruik. Om die dekoderingstyd te verminder word sinne in frases verdeel en die soekruimte heuristies besnoei. Ons oorweeg verskeie modelleringskeuses in die samestelling van outomaatreëls. Die evaluering van ons modelle word gebaseer op presisie en herroepvermoë. Eksperimente word uitgevoer om verskeie fouttipes reg te maak op twee leerderkorpora. Die resultate wys dat ons model kompeterend is met bestaande benaderings op verskeie fouttipes. Grammar correction -- Data processing Natural language processing Weighted tree transducer Text processing (Computer science) Dissertations -- Mathematical sciences Theses -- Mathematical sciences Dissertations -- Computer science Theses -- Computer science Computational linguistics English language -- Grammar
6	Categorical semantics and composition of tree transducers / Kategorielle Semantik und Komposition von Baumübersetzern Jürgensen, Claus 28 December 2004 (has links) (PDF) In this thesis we see two new approaches to compose tree transducers and more general to fuse functional programs. The first abroach is based on initial algebras. We prove a new variant of the acid rain theorem for mutually recursive functions where the build function is substituted by a concrete functor. Moreover, we give a symmetric form (i.e. consumer and producer have the same syntactic form) of our new acid rain theorem where fusion is composition in a category and thus in particular associative. Applying this to compose top-down tree transducers yields the same result (on a syntactic level) as the classical top-down tree transducer composition. The second approach is based on free monads and monad transformers. In the same way as monoids are used in the theory of character string automata, we use monads in the theory of tree transducers. We generalize the notion of a tree transducer defining the monadic transducer, and we prove an according fusion theorem. Moreover, we prove that homomorphic monadic transducers are semantically equivalent. The latter makes it possible to compose syntactic classes of tree transducers (or particular functional programs) by simply composing endofunctors. Baumübersetzer Deforestation Fusion Haskel Monade Monadentransformer Programmtransformation funktionale Sprache Haskell deforestation functional program fusion monad monad transformer program transformation tree transducer ddc:004 rvk:ST 140 Formale Semantik Funktionale Programmiersprache Kategorientheorie Programmtransformation
7	Categorical semantics and composition of tree transducers Jürgensen, Claus 30 January 2004 (has links) In this thesis we see two new approaches to compose tree transducers and more general to fuse functional programs. The first abroach is based on initial algebras. We prove a new variant of the acid rain theorem for mutually recursive functions where the build function is substituted by a concrete functor. Moreover, we give a symmetric form (i.e. consumer and producer have the same syntactic form) of our new acid rain theorem where fusion is composition in a category and thus in particular associative. Applying this to compose top-down tree transducers yields the same result (on a syntactic level) as the classical top-down tree transducer composition. The second approach is based on free monads and monad transformers. In the same way as monoids are used in the theory of character string automata, we use monads in the theory of tree transducers. We generalize the notion of a tree transducer defining the monadic transducer, and we prove an according fusion theorem. Moreover, we prove that homomorphic monadic transducers are semantically equivalent. The latter makes it possible to compose syntactic classes of tree transducers (or particular functional programs) by simply composing endofunctors. info:eu-repo/classification/ddc/004 ddc:004

1

Page generated in 0.0518 seconds