Spelling suggestions: "subject:"free transducer"" "subject:"tree transducer""
1 |
Apprentissage de grammaires catégorielles : transducteurs d’arbres et clustering pour induction de grammaires catégorielles / Learning categorial grammarsSandillon Rezer, Noémie Fleur 09 December 2013 (has links)
De nos jours, il n’est pas rare d’utiliser des logiciels capables d’avoir une conversation, d’interagir avec nous (systèmes questions/réponses pour les SAV, gestion d’interface ou simplement Intelligence Artificielle - IA - de discussion). Ceux-ci doivent comprendre le contexte ou réagir par mot-clefs, mais générer ensuite des réponses cohérentes, aussi bien au niveau du sens de la phrase (sémantique) que de la forme (syntaxe). Si les premières IA se contentaient de phrases toutes faites et réagissaient en fonction de mots-clefs, le processus s’est complexifié avec le temps. Pour améliorer celui-ci, il faut comprendre et étudier la construction des phrases. Nous nous focalisons sur la syntaxe et sa modélisation avec des grammaires catégorielles. L’idée est de pouvoir aussi bien générer des squelettes de phrases syntaxiquement correctes que vérifier l’appartenance d’une phrase à un langage, ici le français (il manque l’aspect sémantique). On note que les grammaires AB peuvent, à l’exception de certains phénomènes comme la quantification et l’extraction, servir de base pour la sémantique en extrayant des λ-termes. Nous couvrons aussi bien l’aspect d’extraction de grammaire à partir de corpus arborés que l’analyse de phrases. Pour ce faire, nous présentons deux méthodes d’extraction et une méthode d’analyse de phrases permettant de tester nos grammaires. La première méthode consiste en la création d’un transducteur d’arbres généralisé, qui transforme les arbres syntaxiques en arbres de dérivation d’une grammaire AB. Appliqué sur les corpus français que nous avons à notre disposition, il permet d’avoir une grammaire assez complète de la langue française, ainsi qu’un vaste lexique. Le transducteur, même s’il s’éloigne peu de la définition usuelle d’un transducteur descendant, a pour particularité d’offrir une nouvelle méthode d’écriture des règles de transduction, permettant une définition compacte de celles-ci. Nous transformons actuellement 92,5% des corpus en arbres de dérivation. Pour notre seconde méthode, nous utilisons un algorithme d’unification en guidant celui-ci avec une étape préliminaire de clustering, qui rassemble les mots en fonction de leur contexte dans la phrase. La comparaison avec les arbres extraits du transducteur donne des résultats encourageants avec 91,3% de similarité. Enfin, nous mettons en place une version probabiliste de l’algorithme CYK pour tester l’efficacité de nos grammaires en analyse de phrases. La couverture obtenue est entre 84,6% et 92,6%, en fonction de l’ensemble de phrases pris en entrée. Les probabilités, appliquées aussi bien sur le type des mots lorsque ceux-ci en ont plusieurs que sur les règles, permettent de sélectionner uniquement le “meilleur” arbre de dérivation.Tous nos logiciels sont disponibles au téléchargement sous licence GNU GPL. / Nowadays, we have become familiar with software interacting with us using natural language (for example in question-answering systems for after-sale services, human-computer interaction or simple discussion bots). These tools have to either react by keyword extraction or, more ambitiously, try to understand the sentence in its context. Though the simplest of these programs only have a set of pre-programmed sentences to react to recognized keywords (these systems include Eliza but also more modern systems like Siri), more sophisticated systems make an effort to understand the structure and the meaning of sentences (these include systems like Watson), allowing them to generate consistent answers, both with respect to the meaning of the sentence (semantics) and with respect to its form (syntax). In this thesis, we focus on syntax and on how to model syntax using categorial grammars. Our goal is to generate syntactically accurate sentences (without the semantic aspect) and to verify that a given sentence belongs to a language - the French language. We note that AB grammars, with the exception of some phenomena like quantification or extraction, are also a good basis for semantic purposes. We cover both grammar extraction from treebanks and parsing using the extracted grammars. On this purpose, we present two extraction methods and test the resulting grammars using standard parsing algorithms. The first method focuses on creating a generalized tree transducer, which transforms syntactic trees into derivation trees corresponding to an AB grammar. Applied on the various French treebanks, the transducer’s output gives us a wide-coverage lexicon and a grammar suitable for parsing. The transducer, even if it differs only slightly from the usual definition of a top-down transducer, offers several new, compact ways to express transduction rules. We currently transduce 92.5% of all sen- tences in the treebanks into derivation trees.For our second method, we use a unification algorithm, guiding it with a preliminary clustering step, which gathers the words according to their context in the sentence. The comparision between the transduced trees and this method gives the promising result of 91.3% of similarity.Finally, we have tested our grammars on sentence analysis with a probabilistic CYK algorithm and a formula assignment step done with a supertagger. The obtained coverage lies between 84.6% and 92.6%, depending on the input corpus. The probabilities, estimated for the type of words and for the rules, enable us to select only the “best” derivation tree. All our software is available for download under GNU GPL licence.
|
2 |
Multioperator Weighted Monadic DatalogStüber, Torsten 06 May 2011 (has links) (PDF)
In this thesis we will introduce multioperator weighted monadic datalog (mwmd), a formal model for specifying tree series, tree transformations, and tree languages. This model combines aspects of multioperator weighted tree automata (wmta), weighted monadic datalog (wmd), and monadic datalog tree transducers (mdtt). In order to develop a rich theory we will define multiple versions of semantics for mwmd and compare their expressiveness. We will study normal forms and decidability results of mwmd and show (by employing particular semantic domains) that the theory of mwmd subsumes the theory of both wmd and mdtt. We conclude this thesis by showing that mwmd even contain wmta as a syntactic subclass and present results concerning this subclass.
|
3 |
Une théorie mécanisée des arbres réguliers en théorie des types dépendants / A mechanized theory of regular trees in dependent type theorySpadotti, Régis 19 May 2016 (has links)
Nous proposons deux caractérisations des arbres réguliers. La première est sémantique et s'appuie sur les types co-inductifs. La seconde est syntaxique et repose sur une représentation des arbres réguliers par des termes cycliques. Nous prouvons que ces deux caractérisations sont isomorphes.Ensuite, nous étudions le problème de la définition de morphisme d'arbres préservant la propriété de régularité. Nous montrons en utilisant le formalisme des transducteurs d'arbres, l'existence d'un critère syntaxique garantissant la préservation de cette propriété. Enfin, nous considérons des applications de la théorie des arbres réguliers comme la définition de l'opérateur de composition parallèle d'une algèbre de processus ou encore, les problèmes de décidabilité sur les arbres réguliers via une mécanisation d'un vérificateur de modèles pour un mu-calcul coalgébrique. Tous les résultats ont été mécanisés et prouvés corrects dans l'assistant de preuve Coq. / We propose two characterizations of regular trees. The first one is semantic and is based on coinductive types. The second one is syntactic and represents regular trees by means of cyclic terms. We prove that both of these characterizations are isomorphic. Then, we study the problem of defining tree morphisms preserving the regularity property. We show, by using the formalism of tree transducers, the existence of syntactic criterion ensuring that this property is preserved. Finally, we consider applications of the theory of regular trees such as the definition of the parallel composition operator of a process algebra or, the decidability problems on regular trees through a mechanization of a model-checker for a coalgebraic mu-calculus. All the results were mechanized and proved correct in the Coq proof assistant.
|
4 |
Multioperator Weighted Monadic DatalogStüber, Torsten 10 February 2011 (has links)
In this thesis we will introduce multioperator weighted monadic datalog (mwmd), a formal model for specifying tree series, tree transformations, and tree languages. This model combines aspects of multioperator weighted tree automata (wmta), weighted monadic datalog (wmd), and monadic datalog tree transducers (mdtt). In order to develop a rich theory we will define multiple versions of semantics for mwmd and compare their expressiveness. We will study normal forms and decidability results of mwmd and show (by employing particular semantic domains) that the theory of mwmd subsumes the theory of both wmd and mdtt. We conclude this thesis by showing that mwmd even contain wmta as a syntactic subclass and present results concerning this subclass.
|
5 |
Probabilistic tree transducers for grammatical error correctionBuys, Jan Moolman 12 1900 (has links)
Thesis (MSc)--Stellenbosch University, 2013. / ENGLISH ABSTRACT: We investigate the application of weighted tree transducers to correcting grammatical
errors in natural language. Weighted finite-state transducers (FST) have been
used successfully in a wide range of natural language processing (NLP) tasks, even
though the expressiveness of the linguistic transformations they perform is limited.
Recently, there has been an increase in the use of weighted tree transducers and
related formalisms that can express syntax-based natural language transformations
in a probabilistic setting.
The NLP task that we investigate is the automatic correction of grammar errors
made by English language learners. In contrast to spelling correction, which can
be performed with a very high accuracy, the performance of grammar correction
systems is still low for most error types. Commercial grammar correction systems
mostly use rule-based methods. The most common approach in recent grammatical
error correction research is to use statistical classifiers that make local decisions about
the occurrence of specific error types. The approach that we investigate is related to
a number of other approaches inspired by statistical machine translation (SMT) or
based on language modelling. Corpora of language learner writing annotated with
error corrections are used as training data.
Our baseline model is a noisy-channel FST model consisting of an n-gram language
model and a FST error model, which performs word insertion, deletion and
replacement operations. The tree transducer model we use to perform error correction
is a weighted top-down tree-to-string transducer, formulated to perform transformations
between parse trees of correct sentences and incorrect sentences. Using
an algorithm developed for syntax-based SMT, transducer rules are extracted from
training data of which the correct version of sentences have been parsed. Rule weights
are also estimated from the training data. Hypothesis sentences generated by the
tree transducer are reranked using an n-gram language model.
We perform experiments to evaluate the performance of different configurations
of the proposed models. In our implementation an existing tree transducer toolkit is
used. To make decoding time feasible sentences are split into clauses and heuristic
pruning is performed during decoding. We consider different modelling choices in the
construction of transducer rules. The evaluation of our models is based on precision
and recall. Experiments are performed to correct various error types on two learner
corpora. The results show that our system is competitive with existing approaches
on several error types. / AFRIKAANSE OPSOMMING: Ons ondersoek die toepassing van geweegde boomoutomate om grammatikafoute in
natuurlike taal outomaties reg te stel. Geweegde eindigetoestand outomate word
suksesvol gebruik in ’n wye omvang van take in natuurlike taalverwerking, alhoewel
die uitdrukkingskrag van die taalkundige transformasies wat hulle uitvoer beperk
is. Daar is die afgelope tyd ’n toename in die gebruik van geweegde boomoutomate
en verwante formalismes wat sintaktiese transformasies in natuurlike taal in ’n
probabilistiese raamwerk voorstel.
Die natuurlike taalverwerkingstoepassing wat ons ondersoek is die outomatiese
regstelling van taalfoute wat gemaak word deur Engelse taalleerders. Terwyl speltoetsing
in Engels met ’n baie hoë akkuraatheid gedoen kan word, is die prestasie van
taalregstellingstelsels nog relatief swak vir meeste fouttipes. Kommersiële taalregstellingstelsels
maak oorwegend gebruik van reël-gebaseerde metodes. Die algemeenste
benadering in onlangse navorsing oor grammatikale foutkorreksie is om statistiese
klassifiseerders wat plaaslike besluite oor die voorkoms van spesifieke fouttipes maak
te gebruik. Die benadering wat ons ondersoek is verwant aan ’n aantal ander benaderings
wat geïnspireer is deur statistiese masjienvertaling of op taalmodellering
gebaseer is. Korpora van taalleerderskryfwerk wat met foutregstellings geannoteer
is, word as afrigdata gebruik.
Ons kontrolestelsel is ’n geraaskanaal eindigetoestand outomaatmodel wat bestaan
uit ’n n-gram taalmodel en ’n foutmodel wat invoegings-, verwyderings- en vervangingsoperasies
op woordvlak uitvoer. Die boomoutomaatmodel wat ons gebruik
vir grammatikale foutkorreksie is ’n geweegde bo-na-onder boom-na-string omsetteroutomaat
geformuleer om transformasies tussen sintaksbome van korrekte sinne
en foutiewe sinne te maak. ’n Algoritme wat ontwikkel is vir sintaksgebaseerde
statistiese masjienvertaling word gebruik om reëls te onttrek uit die afrigdata, waarvan
sintaksontleding op die korrekte weergawe van die sinne gedoen is. Reëlgewigte
word ook vanaf die afrigdata beraam. Hipotese-sinne gegenereer deur die boomoutomaat
word herrangskik met behulp van ’n n-gram taalmodel.
Ons voer eksperimente uit om die doeltreffendheid van verskillende opstellings
van die voorgestelde modelle te evalueer. In ons implementering word ’n bestaande
boomoutomaat sagtewarepakket gebruik. Om die dekoderingstyd te verminder word
sinne in frases verdeel en die soekruimte heuristies besnoei. Ons oorweeg verskeie
modelleringskeuses in die samestelling van outomaatreëls. Die evaluering van ons
modelle word gebaseer op presisie en herroepvermoë. Eksperimente word uitgevoer
om verskeie fouttipes reg te maak op twee leerderkorpora. Die resultate wys dat ons
model kompeterend is met bestaande benaderings op verskeie fouttipes.
|
6 |
Categorical semantics and composition of tree transducers / Kategorielle Semantik und Komposition von BaumübersetzernJürgensen, Claus 28 December 2004 (has links) (PDF)
In this thesis we see two new approaches to compose tree transducers and more general to fuse functional programs. The first abroach is based on initial algebras. We prove a new variant of the acid rain theorem for mutually recursive functions where the build function is substituted by a concrete functor. Moreover, we give a symmetric form (i.e. consumer and producer have the same syntactic form) of our new acid rain theorem where fusion is composition in a category and thus in particular associative. Applying this to compose top-down tree transducers yields the same result (on a syntactic level) as the classical top-down tree transducer composition. The second approach is based on free monads and monad transformers. In the same way as monoids are used in the theory of character string automata, we use monads in the theory of tree transducers. We generalize the notion of a tree transducer defining the monadic transducer, and we prove an according fusion theorem. Moreover, we prove that homomorphic monadic transducers are semantically equivalent. The latter makes it possible to compose syntactic classes of tree transducers (or particular functional programs) by simply composing endofunctors.
|
7 |
Categorical semantics and composition of tree transducersJürgensen, Claus 30 January 2004 (has links)
In this thesis we see two new approaches to compose tree transducers and more general to fuse functional programs. The first abroach is based on initial algebras. We prove a new variant of the acid rain theorem for mutually recursive functions where the build function is substituted by a concrete functor. Moreover, we give a symmetric form (i.e. consumer and producer have the same syntactic form) of our new acid rain theorem where fusion is composition in a category and thus in particular associative. Applying this to compose top-down tree transducers yields the same result (on a syntactic level) as the classical top-down tree transducer composition. The second approach is based on free monads and monad transformers. In the same way as monoids are used in the theory of character string automata, we use monads in the theory of tree transducers. We generalize the notion of a tree transducer defining the monadic transducer, and we prove an according fusion theorem. Moreover, we prove that homomorphic monadic transducers are semantically equivalent. The latter makes it possible to compose syntactic classes of tree transducers (or particular functional programs) by simply composing endofunctors.
|
Page generated in 0.0768 seconds