Global ETD Search

1	Transformation and Combination in Data-Driven Dependency Parcing Nilsson, Jens January 2009 (has links) This thesis deals with automatic syntactic analysis of natural languagetext, also known as parsing. The parsing approach is data-driven, whichmeans that parsers are constructed by means of machine learning, lookingat training data in the form of annotated natural language sentences. The syntactic framework used in the thesis is dependency-based. Robustness is one of the characteristics of the data-driven approaches investigated here.The overall aim of this thesis is to maintain robustness while increasing accuracy.The content of the thesis falls naturally into two tracks, a transformation track and a combination track. The rst type of transformation investigatedis called pseudo-projective, because it enables strictly projective dependency parsers to recover non-projective dependency relations. Informally,a non-projective dependency tree contains crossing binary directed relations, when drawn above the sentence. Experimental results show that pseudo-projective transformations can improve accuracy significantly for a range of languages. The second type of transformation aims to facilitate the processing of specific linguistic constructions such as coordination and verb groups. Experimental results again show a positive effect on parsing accuracy for several languages, often greater than for the pseudo-projective transformations. However, the improvement of the transformations dependson the internal structure of the base parser, which is not the case for thepseudo-projective transformations. The combination track compares various approaches for combining data driven dependency parsers, again as a means of improving accuracy. As different parsers have different strengths and weaknesses, making parsers collaborate in order to nd one single syntactic analysis may result in higher accuracy than any of the syntactic analyzers can produce by itself. The experimental results show that accuracy improves across languages, giventhat appropriate parsers are combined. The thesis ends with an attempt to combine the two tracks, showing that combining parsers with different tree transformations also increases accuracy. Moreover, this experiment indicates that high diversity among a small set of parsers is much more important than a large number of parsers with low diversity. Natural Language Parsing Dependency Parsing Tree Transformation Computer science Datavetenskap
2	Transition-Based Natural Language Parsing with Dependency and Constituency Representations Hall, Johan January 2008 (has links) Denna doktorsavhandling undersöker olika aspekter av automatisk syntaktisk analys av texter på naturligt språk. En parser eller syntaktisk analysator, som vi definierar den i denna avhandling, har till uppgift att skapa en syntaktisk analys för varje mening i en text på naturligt språk. Vår metod är datadriven, vilket innebär att den bygger på maskininlärning från uppmärkta datamängder av naturligt språk, s.k. korpusar. Vår metod är också dependensbaserad, vilket innebär att parsning är en process som bygger en dependensgraf för varje mening, bestående av binära relationer mellan ord. Dessutom introducerar avhandlingen en ny metod för att koda frasstrukturer, en annan syntaktisk representationsform, som dependensgrafer vilka kan avkodas utan att information i frasstrukturen går förlorad. Denna metod möjliggör att en dependensbaserad parser kan användas för att syntaktiskt analysera frasstrukturer. Avhandlingen är baserad på fem artiklar, varav tre artiklar utforskar olika aspekter av maskininlärning för datadriven dependensparsning och två artiklar undersöker metoden för dependensbaserad frasstrukturparsning. Den första artikeln presenterar vår första storskaliga empiriska studie av parsning av naturligt språk (i detta fall svenska) med dependensrepresentationer. En transitionsbaserad deterministisk parsningsalgoritm skapar en dependensgraf för varje mening genom att härleda en sekvens av transitioner, och minnesbaserad inlärning (MBL) används för att förutsäga transitionssekvensen. Den andra artikeln undersöker ytterligare hur maskininlärning kan användas för att vägleda en transitionsbaserad dependensparser. Den empiriska studien jämför två metoder för maskininlärning med fem särdragsmodeller för tre språk (kinesiska, engelska och svenska), och studien visar att supportvektormaskiner (SVM) med lexikaliserade särdragsmodeller är bättre lämpade än MBL för att vägleda en transitionsbaserad dependensparser. Den tredje artikeln sammanfattar vår erfarenhet av att optimera MaltParser, vår implementation av transitionsbaserad dependensparsning, för ett stort antal språk. MaltParser har använts för att analysera över tjugo olika språk och var bland de främsta systemen i CoNLLs utvärdering 2006 och 2007. Den fjärde artikeln är vår första undersökning av dependensbaserad frastrukturparsning med konkurrenskraftiga resultat för parsning av tyska. Den femte och sista artikeln introducerar en förbättrad algoritm för att transformera frasstrukturer till dependensgrafer och tillbaka, vilket gör det möjligt att parsa kontinuerliga och diskontinuerliga frasstrukturer utökade med grammatiska funktioner. / Hall, Johan, 2008. Transition-Based Natural Language Parsing with Dependency and Constituency Representations, Acta Wexionensia No 152/2008. ISSN: 1404-4307, ISBN: 978-91-7636-625-7. Written in English. This thesis investigates different aspects of transition-based syntactic parsing of natural language text, where we view syntactic parsing as the process of mapping sentences in unrestricted text to their syntactic representations. Our parsing approach is data-driven, which means that it relies on machine learning from annotated linguistic corpora. Our parsing approach is also dependency-based, which means that the parsing process builds a dependency graph for each sentence consisting of lexical nodes linked by binary relations called dependencies. However, the output of the parsing process is not restricted to dependency-based representations, and the thesis presents a new method for encoding phrase structure representations as dependency representations that enable an inverse transformation without loss of information. The thesis is based on five papers, where three papers explore different ways of using machine learning to guide a transition-based dependency parser and two papers investigate the method for dependency-based phrase structure parsing. The first paper presents our first large-scale empirical study of parsing a natural language (in this case Swedish) with labeled dependency representations using a transition-based deterministic parsing algorithm, where the dependency graph for each sentence is constructed by a sequence of transitions and memory-based learning (MBL) is used to predict the transition sequence. The second paper further investigates how machine learning can be used for guiding a transition-based dependency parser. The empirical study compares two machine learning methods with five feature models for three languages (Chinese, English and Swedish), and the study shows that support vector machines (SVM) with lexicalized feature models are better suited than MBL for guiding a transition-based dependency parser. The third paper summarizes our experience of optimizing and tuning MaltParser, our implementation of transition-based parsing, for a wide range of languages. MaltParser has been applied to over twenty languages and was one of the top-performing systems in the CoNLL shared tasks of 2006 and 2007. The fourth paper is our first investigation of dependency-based phrase structure parsing with competitive results for parsing German. The fifth paper presents an improved encoding method for transforming phrase structure representations into dependency graphs and back. With this method it is possible to parse continuous and discontinuous phrase structure extended with grammatical functions. Natural Language Parsing Syntactic Parsing Dependency Structure Phrase Structure Machine Learning Computer science Datavetenskap
3	Jeux de typage et analyse de lambda-grammaires non-contextuelles Bourreau, Pierre 29 June 2012 (has links) Les grammaires catégorielles abstraites (ou λ-grammaires) sont un formalisme basé sur le λ-calcul simplement typé. Elles peuvent être vues comme des grammaires générant de tels termes, et ont été introduites aﬁn de modéliser l’interface entre la syntaxe et la sémantique du langage naturel, réunissant deux idées fondamentales : la distinction entre tectogrammaire (c.a.d. structure profonde d’un énoncé) et phénogrammaire (c.a.d représentation de la surface d’un énoncé) de la langue, ex- primé par Curry ; et une modélisation algébrique du principe de compositionnalité aﬁn de rendre compte de la sémantique des phrases, due à Montague. Un des avantages principaux de ce formalisme est que l’analyse d’une grammaires catégorielle abstraite permet de résoudre aussi bien le problème de l’analyse de texte, que celui de la génération de texte. Des algorithmes d’analyse efﬁcaces ont été découverts pour les grammaires catégorielles abstraites de termes linéaires et quasi-linéaires, alors que le problème de l’analyse est non-élémentaire dans sa forme la plus générale. Nous proposons d’étudier des classes de termes pour lesquels l’analyse grammaticale reste solvable en temps polynomial. Ces résultats s’appuient principalement sur deux théorèmes de typage : le théorème de cohérence, spéciﬁant qu’un λ-terme donné est l’unique habitant d’un certain typage ; et le théorème d’expansion du sujet, spéciﬁant que deux termes β-équivalents habitent les même typages. Aﬁn de mener cette étude à bien, nous utiliserons une représentation abstraite des notions de λ-termes et de typages, sous forme de jeux. En particulier, nous nous appuierons grandement sur cette notion aﬁn de démontrer le théorème de cohérence pour de nouvelles familles de λ-termes et de typages. Grâce à ces résultats, nous montrerons qu’il est possible de construire de manière directe, un reconnaisseur dans le langage Datalog, pour des grammaires catégorielles abstraites de -termes quasi-afﬁnes. / Abstract categorial grammars (or, equivalently, lambda-grammars) is formalism based on the simply-typed lambda-calculus. These grammars can be described as grammars of such terms and were introduced in order to bring a model of the syntax-semantics interface in natural language, based on two main ideas: the distinction between the tectogrammatical (i.e. the deep structure of an utterance) and phenogrammatical (i.e. the interpretation of this structure) levels in natural languages, which was expressed by Curry; and an algebraic modeling of the principle of compositionality in order to give account of the semantics of a sentence. an idea formalized by Montague. One of the main advantages of abstract categorial grammars is that both the problems of natural language parsing and generation can be tackled under the same problem: parsing abstract categorial grammars. Efficient algorithms were discovered for abstract categorial grammars of linear and almost linear lambda-terms, while it is known the recognition problem is decidable but non-elementary in general. This work focuses on the study of classes of terms for which parsing can still be solved in polynomial time. The results we give are mainly based on two theorems: the coherence theorem which specifies that a given lambda-term in the desired class must be the unique inhabitant of one of its typing; and the subject expansion theorem, which states that two beta-equivalent terms of the desired class must inhabit the same typings. In order to lead the study, we use an alternative representation of both simply-typed lambda-terms and their typings as games. In particular, we will use this representation in order to prove the coherence theorems for new classes of lambda-terms. Thanks to these results, we will show it is possible to build in a direct way, recognizers for grammars of almost affine lambda-terms as Datalog programs. Lambda-calcul Linguistique formelle Analyse et génération de langage Datalog Sémantique des jeux Lambda calculus Formal linguistic Natural language parsing and generation Datalog Game semantics
4	The application of constraint rules to data-driven parsing Jaf, Sardar January 2015 (has links) The process of determining the structural relationships between words in both natural and machine languages is known as parsing. Parsers are used as core components in a number of Natural Language Processing (NLP) applications such as online tutoring applications, dialogue-based systems and textual entailment systems. They have been used widely in the development of machine languages. In order to understand the way parsers work, we will investigate and describe a number of widely used parsing algorithms. These algorithms have been utilised in a range of different contexts such as dependency frameworks and phrase structure frameworks. We will investigate and describe some of the fundamental aspects of each of these frameworks, which can function in various ways including grammar-driven approaches and data-driven approaches. Grammar-driven approaches use a set of grammatical rules for determining the syntactic structures of sentences during parsing. Data-driven approaches use a set of parsed data to generate a parse model which is used for guiding the parser during the processing of new sentences. A number of state-of-the-art parsers have been developed that use such frameworks and approaches. We will briefly highlight some of these in this thesis. There are three specific important features that it is important to integrate into the development of parsers. These are efficiency, accuracy, and robustness. Efficiency is concerned with the use of as little time and computing resources as possible when processing natural language text. Accuracy involves maximising the correctness of the analyses that a parser produces. Robustness is a measure of a parser’s ability to cope with grammatically complex sentences and produce analyses of a large proportion of a set of sentences. In this thesis, we present a parser that can efficiently, accurately, and robustly parse a set of natural language sentences. Additionally, the implementation of the parser presented here allows for some trading-off between different levels of parsing performance. For example, some NLP applications may emphasise efficiency/robustness over accuracy while some other NLP systems may require a greater focus on accuracy. In dialogue-based systems, it may be preferable to produce a correct grammatical analysis of a question, rather than incorrectly analysing the grammatical structure of a question or quickly producing a grammatically incorrect answer for a question. Alternatively, it may be desirable that document translation systems translate a document into a different language quickly but less accurately, rather than slowly but highly accurately, because users may be able to correct grammatically incorrect sentences manually if necessary. The parser presented here is based on data-driven approaches but we will allow for the application of constraint rules to it in order to improve its performance. 006.3

1

Page generated in 0.0753 seconds