Global ETD Search

1	Learning for semantic parsing using statistical syntactic parsing techniques Ge, Ruifang 15 October 2014 (has links) Natural language understanding is a sub-field of natural language processing, which builds automated systems to understand natural language. It is such an ambitious task that it sometimes is referred to as an AI-complete problem, implying that its difficulty is equivalent to solving the central artificial intelligence problem -- making computers as intelligent as people. Despite its complexity, natural language understanding continues to be a fundamental problem in natural language processing in terms of its theoretical and empirical importance. In recent years, startling progress has been made at different levels of natural language processing tasks, which provides great opportunity for deeper natural language understanding. In this thesis, we focus on the task of semantic parsing, which maps a natural language sentence into a complete, formal meaning representation in a meaning representation language. We present two novel state-of-the-art learned syntax-based semantic parsers using statistical syntactic parsing techniques, motivated by the following two reasons. First, the syntax-based semantic parsing is theoretically well-founded in computational semantics. Second, adopting a syntax-based approach allows us to directly leverage the enormous progress made in statistical syntactic parsing. The first semantic parser, Scissor, adopts an integrated syntactic-semantic parsing approach, in which a statistical syntactic parser is augmented with semantic parameters to produce a semantically-augmented parse tree (SAPT). This integrated approach allows both syntactic and semantic information to be available during parsing time to obtain an accurate combined syntactic-semantic analysis. The performance of Scissor is further improved by using discriminative reranking for incorporating non-local features. The second semantic parser, SynSem, exploits an existing syntactic parser to produce disambiguated parse trees that drive the compositional semantic interpretation. This pipeline approach allows semantic parsing to conveniently leverage the most recent progress in statistical syntactic parsing. We report experimental results on two real applications: an interpreter for coaching instructions in robotic soccer and a natural-language database interface, showing that the improvement of Scissor and SynSem over other systems is mainly on long sentences, where the knowledge of syntax given in the form of annotated SAPTs or syntactic parses from an existing parser helps semantic composition. SynSem also significantly improves results with limited training data, and is shown to be robust to syntactic errors. / text Semantic parsing Statistical syntactic parsing
2	Transition-Based Natural Language Parsing with Dependency and Constituency Representations Hall, Johan January 2008 (has links) Denna doktorsavhandling undersöker olika aspekter av automatisk syntaktisk analys av texter på naturligt språk. En parser eller syntaktisk analysator, som vi definierar den i denna avhandling, har till uppgift att skapa en syntaktisk analys för varje mening i en text på naturligt språk. Vår metod är datadriven, vilket innebär att den bygger på maskininlärning från uppmärkta datamängder av naturligt språk, s.k. korpusar. Vår metod är också dependensbaserad, vilket innebär att parsning är en process som bygger en dependensgraf för varje mening, bestående av binära relationer mellan ord. Dessutom introducerar avhandlingen en ny metod för att koda frasstrukturer, en annan syntaktisk representationsform, som dependensgrafer vilka kan avkodas utan att information i frasstrukturen går förlorad. Denna metod möjliggör att en dependensbaserad parser kan användas för att syntaktiskt analysera frasstrukturer. Avhandlingen är baserad på fem artiklar, varav tre artiklar utforskar olika aspekter av maskininlärning för datadriven dependensparsning och två artiklar undersöker metoden för dependensbaserad frasstrukturparsning. Den första artikeln presenterar vår första storskaliga empiriska studie av parsning av naturligt språk (i detta fall svenska) med dependensrepresentationer. En transitionsbaserad deterministisk parsningsalgoritm skapar en dependensgraf för varje mening genom att härleda en sekvens av transitioner, och minnesbaserad inlärning (MBL) används för att förutsäga transitionssekvensen. Den andra artikeln undersöker ytterligare hur maskininlärning kan användas för att vägleda en transitionsbaserad dependensparser. Den empiriska studien jämför två metoder för maskininlärning med fem särdragsmodeller för tre språk (kinesiska, engelska och svenska), och studien visar att supportvektormaskiner (SVM) med lexikaliserade särdragsmodeller är bättre lämpade än MBL för att vägleda en transitionsbaserad dependensparser. Den tredje artikeln sammanfattar vår erfarenhet av att optimera MaltParser, vår implementation av transitionsbaserad dependensparsning, för ett stort antal språk. MaltParser har använts för att analysera över tjugo olika språk och var bland de främsta systemen i CoNLLs utvärdering 2006 och 2007. Den fjärde artikeln är vår första undersökning av dependensbaserad frastrukturparsning med konkurrenskraftiga resultat för parsning av tyska. Den femte och sista artikeln introducerar en förbättrad algoritm för att transformera frasstrukturer till dependensgrafer och tillbaka, vilket gör det möjligt att parsa kontinuerliga och diskontinuerliga frasstrukturer utökade med grammatiska funktioner. / Hall, Johan, 2008. Transition-Based Natural Language Parsing with Dependency and Constituency Representations, Acta Wexionensia No 152/2008. ISSN: 1404-4307, ISBN: 978-91-7636-625-7. Written in English. This thesis investigates different aspects of transition-based syntactic parsing of natural language text, where we view syntactic parsing as the process of mapping sentences in unrestricted text to their syntactic representations. Our parsing approach is data-driven, which means that it relies on machine learning from annotated linguistic corpora. Our parsing approach is also dependency-based, which means that the parsing process builds a dependency graph for each sentence consisting of lexical nodes linked by binary relations called dependencies. However, the output of the parsing process is not restricted to dependency-based representations, and the thesis presents a new method for encoding phrase structure representations as dependency representations that enable an inverse transformation without loss of information. The thesis is based on five papers, where three papers explore different ways of using machine learning to guide a transition-based dependency parser and two papers investigate the method for dependency-based phrase structure parsing. The first paper presents our first large-scale empirical study of parsing a natural language (in this case Swedish) with labeled dependency representations using a transition-based deterministic parsing algorithm, where the dependency graph for each sentence is constructed by a sequence of transitions and memory-based learning (MBL) is used to predict the transition sequence. The second paper further investigates how machine learning can be used for guiding a transition-based dependency parser. The empirical study compares two machine learning methods with five feature models for three languages (Chinese, English and Swedish), and the study shows that support vector machines (SVM) with lexicalized feature models are better suited than MBL for guiding a transition-based dependency parser. The third paper summarizes our experience of optimizing and tuning MaltParser, our implementation of transition-based parsing, for a wide range of languages. MaltParser has been applied to over twenty languages and was one of the top-performing systems in the CoNLL shared tasks of 2006 and 2007. The fourth paper is our first investigation of dependency-based phrase structure parsing with competitive results for parsing German. The fifth paper presents an improved encoding method for transforming phrase structure representations into dependency graphs and back. With this method it is possible to parse continuous and discontinuous phrase structure extended with grammatical functions. Natural Language Parsing Syntactic Parsing Dependency Structure Phrase Structure Machine Learning Computer science Datavetenskap
3	Rich Linguistic Structure from Large-Scale Web Data Yamangil, Elif 18 October 2013 (has links) The past two decades have shown an unexpected effectiveness of Web-scale data in natural language processing. Even the simplest models, when paired with unprecedented amounts of unstructured and unlabeled Web data, have been shown to outperform sophisticated ones. It has been argued that the effectiveness of Web-scale data has undermined the necessity of sophisticated modeling or laborious data set curation. In this thesis, we argue for and illustrate an alternative view, that Web-scale data not only serves to improve the performance of simple models, but also can allow the use of qualitatively more sophisticated models that would not be deployable otherwise, leading to even further performance gains. / Engineering and Applied Sciences Computer science Linguistics Statistics bayesian statistics grammars sentence compression statistical inference syntactic parsing wikipedia
4	Leveraging MWEs in practical TAG parsing : towards the best of the two worlds / Optimisation d'analyse syntaxique basée sur les grammaires d'arbres adjoints grâce à la modélisation d'expression polylexicales et à l'algorithme A Waszczuk, Jakub 26 June 2017 (has links) Dans ce mémoire, nous nous penchons sur les expressions polylexicales (EP) et leurs relations avec l’analyse syntaxique, la tâche qui consiste à déterminer les relations syntaxiques entre les mots dans une phrase donnée. Le défi que posent les EP dans ce contexte, par rapport aux expressions linguistiques régulières, provient de leurs propriétés parfois inattendues qui les rendent difficiles à gérer dans te traitement automatique des langues. Dans nos travaux, nous montrons qu’il est pourtant possible de profiter de ce cette caractéristique des EP afin d’améliorer les résultats d’analyse syntaxique. Notamment, avec les grammaires d’arbres adjoints (TAGs), qui fournissent un cadre naturel et puissant pour la modélisation des EP, ainsi qu’avec des stratégies de recherche basées sur l’algorithme A* , il est possible d’obtenir des gains importants au niveau de la vitesse sans pour autant détériorer la qualité de l’analyse syntaxique. Cela contraste avec des méthodes purement statistiques qui, malgré l’efficacité, ne fournissent pas de solutions satisfaisantes en ce qui concerne les EP. Nous proposons un analyseur syntaxique novateur qui combine les grammaires TAG avec La technique A, axé sur la prédiction des EP, dont les fonctionnalités permettent des applications à grande échelle, facilement extensible au contexte probabiliste. / In this thesis, we focus on multiword expressions (MWEs) and their relationships with syntactic parsing. The latter task consists in retrieving the syntactic relations holding between the words in a given sentence. The challenge of MWEs in this respect is that, in contrast to regular linguistic expressions, they exhibit various irregular properties which make them harder to deal with in natural language processing. In our work, we show that the challenge of the MWE-related irregularities can be turned into an advantage in practical symbolic parsing. Namely, with tree adjoining grammars (TAGs), which provide first-cLass support for MWEs, and A search strategies, considerable speed-up gains can be achieved by promoting MWE-based analyses with virtually no loss in syntactic parsing accuracy. This is in contrast to purely statistical state-of-the-art parsers, which, despite efficiency, provide no satisfactory support for MWEs. We contribute a TAG-A* -MWE-aware parsing architecture with facilities (grammar compression and feature structures) enabling real-world applications, easily extensible to a probabilistic framework. Expressions polytexicales Analyse syntaxique Grammaires d’arbres adjoints Algorithme A* Multiwords expression Syntactic parsing Tree adjoining grammars Algorithm A*
5	Towards deep content extraction from specialized discourse : the case of verbal relations in patent claims Ferraro, Gabriela 20 July 2012 (has links) This thesis addresses the problem of the development of Natural Language Processing techniques for the extraction and generalization of compositional and functional relations from specialized written texts and, in particular, from patent claims. One of the most demanding tasks tackled in the thesis is, according to the state of the art, the semantic generalization of linguistic denominations of relations between object components and processes described in the texts. These denominations are usually verbal expressions or nominalizations that are too concrete to be used as standard labels in knowledge representation forms -as, for example, “A leads to B”, and “C provokes D”, where “leads to” and “provokes” both express, in abstract terms, a cause, such that in both cases “A CAUSE B” and “C CAUSE D” would be more appropriate. A semantic generalization of the relations allows us to achieve a higher degree of abstraction of the relationships between objects and processes described in the claims and reduces their number to a limited set that is oriented towards relations as commonly used in the generic field of knowledge representation. / Esta tesis se centra en el del desarrollo de tecnologías del Procesamiento del Lenguage Natural para la extracción y generalización de relaciones encontradas en textos especializados; concretamente en las reivindicaciones de patentes. Una de las tareas más demandadas de nuestro trabajo, desde el punto vista del estado de la cuestión, es la generalización de las denominaciones lingüísticas de las relaciones. Estas denominaciones, usualmente verbos, son demasiado concretas para ser usadas como etiquetas de relaciones en el contexto de la representación del conocimiento; por ejemplo, “A lleva a B”, “B es el resultado de A” están mejor representadas por “A causa B”. La generalización de relaciones permite reducir el n\'umero de relaciones a un conjunto limitado, orientado al tipo de relaciones utilizadas en el campo de la representación del conocimiento. Artificial intelligence Natural language processing Relation extraction Linguistic dependency Deep syntactic parsing Surface syntactic parsing Valency structure Relation generalization Relation clustering Relation cluster labeling Text simplification Patent processing Patent claim processing Meaning-Text Theory Machine learning 81
6	Rysy z eye-trackeru v syntaktickém parsingu / Eye-tracking features in syntactic parsing Agrawal, Abhishek January 2020 (has links) In this thesis, we explore the potential benefits of leveraging eye-tracking information for dependency parsing on the English part of the Dundee corpus. To achieve this, we cast dependency parsing as a sequence labelling task and then augment the neural model for sequence labelling with eye-tracking features. We also augment a graph-based parser with eye-tracking features and parse the Dundee Corpus to corroborate our findings from the sequence labelling parser. We then experiment with a variety of parser setups ranging from parsing with all features to a delexicalized parser. Our experiments show that for a parser with all features, although the improvements are positive for the LAS score they are not significant whereas our delexicalized parser significantly outperforms the baseline we established. We also analyze the contribution of various eye-tracking features towards the different parser setups and find that eye-tracking features contain information which is complementary in nature, thus implying that augmenting the parser with various gaze features grouped together provides better performance than any individual gaze feature. 1
7	Discontinuous constituency parsing of morphologically rich languages / Analyse syntaxique automatique en constituants discontinus des langues à morphologie riche Coavoux, Maximin 11 December 2017 (has links) L’analyse syntaxique consiste à prédire la représentation syntaxique de phrases en langue naturelle sous la forme d’arbres syntaxiques. Cette tâche pose des problèmes particuliers pour les langues non-configurationnelles ou qui ont une morphologie flexionnelle plus riche que celle de l’anglais. En particulier, ces langues manifestent une dispersion lexicale problématique, des variations d’ordre des mots plus fréquentes et nécessitent de prendre en compte la structure interne des mots-formes pour permettre une analyse syntaxique de qualité satisfaisante. Dans cette thèse, nous nous plaçons dans le cadre de l’analyse syntaxique robuste en constituants par transitions. Dans un premier temps, nous étudions comment intégrer l’analyse morphologique à l’analyse syntaxique, à l’aide d’une architecture de réseaux de neurones basée sur l’apprentissage multitâches. Dans un second temps, nous proposons un système de transitions qui permet de prédire des structures générées par des grammaires légèrement sensibles au contexte telles que les LCFRS. Enfin, nous étudions la question de la lexicalisation de l’analyse syntaxique. Les analyseurs syntaxiques en constituants lexicalisés font l’hypothèse que les constituants s’organisent autour d’une tête lexicale et que la modélisation des relations bilexicales est cruciale pour désambiguïser. Nous proposons un système de transition non lexicalisé pour l’analyse en constituants discontinus et un modèle de scorage basé sur les frontières de constituants et montrons que ce système, plus simple que des systèmes lexicalisés, obtient de meilleurs résultats que ces derniers. / Syntactic parsing consists in assigning syntactic trees to sentences in natural language. Syntactic parsing of non-configurational languages, or languages with a rich inflectional morphology, raises specific problems. These languages suffer more from lexical data sparsity and exhibit word order variation phenomena more frequently. For these languages, exploiting information about the internal structure of word forms is crucial for accurate parsing. This dissertation investigates transition-based methods for robust discontinuous constituency parsing. First of all, we propose a multitask learning neural architecture that performs joint parsing and morphological analysis. Then, we introduce a new transition system that is able to predict discontinuous constituency trees, i.e.\ syntactic structures that can be seen as derivations of mildly context-sensitive grammars, such as LCFRS. Finally, we investigate the question of lexicalization in syntactic parsing. Some syntactic parsers are based on the hypothesis that constituent are organized around a lexical head and that modelling bilexical dependencies is essential to solve ambiguities. We introduce an unlexicalized transition system for discontinuous constituency parsing and a scoring model based on constituent boundaries. The resulting parser is simpler than lexicalized parser and achieves better results in both discontinuous and projective constituency parsing. Analyse syntaxique automatique Arbres en constituants discontinus Systèmes de transitions Apprentissage multitâche Syntactic parsing Discontinuous constituency trees Transition systems Multitask learning
8	Adjetivos adverbializados: anÃlise lÃxico-funcional e implementaÃÃo computacional Daniel de FranÃa Brasil Soares 00 September 2018 (has links) FundaÃÃo de Amparo Ã Pesquisa do Estado do CearÃ / In this work, we propose a computational linguistic analysis of the so-called adverbialized adjectives (hereinafter AdjAdvs). On the one hand, we start by questioning whether AdjAdvs belong to category A(djective) or ADV(erb) and, on the other, which approach is computationally more efficient. We base our point of view according to the Lexical-Functional Grammar (LFG) (KAPLAN and BRESNAN, 1982) and implement in the XLE system (Xerox Linguistic Environment) a fragment of Brazilian Portuguese grammar (henceforth PB) capable of analyzing adjectives in adverbial use. Our implementation is based on the adaptation of a fragment of French grammar constructed by Schwarze and Alencar (2016) and deepened in FrGramm by Alencar (2017). This fragment of grammar adapted to PB serves as the basis for the implementation of two versions for a comparative analysis: G-A and G-ADV. In the first version, AdjAdvs are analyzed as adjectival category, while in the second they are analyzed as adverbial category. The implementation of G-A and G-ADV is evaluated by applying a parser to a set of 168 grammatical sentences and 286 ungrammatical sentences. After testing grammatical and ungrammatical sentence sets, G-A and G-ADV grammars processing results in XLE and the statistical analysis based on the double factor variance test, we concluded that there was no significant difference in treatment of syntax between G-A and G-ADV versions built to parse AdjAdvs. This result reinforces Radford (1988) argument that adjectives and adverbs belong to a single category. / Neste trabalho, propomos uma anÃlise linguÃstico-computacional dos chamados adjetivos adverbializados (doravante AdjAdvs). Partimos, por um lado, do questionamento se AdjAdvs pertencem Ã categoria A(djetivo) ou ADV(Ãrbio) e, por outro, que abordagem Ã computacionalmente mais eficiente. Fundamentamos nosso ponto de vista de acordo com a GramÃtica LÃxico-Funcional (LFG, em inglÃs Lexical-Functional Grammar) (cf. KAPLAN e BRESNAN, 1982) e implementamos no sistema XLE (do inglÃs Xerox Linguistic Environment) um fragmento de gramÃtica do portuguÃs brasileiro (doravante PB) capaz de analisar adjetivos em uso adverbial. Nossa implementaÃÃo parte da adaptaÃÃo de uma minigramÃtica do francÃs construÃda por Schwarze e Alencar (2016) e aprofundada na FrGramm por Alencar (2017). Esse fragmento de gramÃtica adaptado ao PB serve de base para a construÃÃo de duas versÃes para uma anÃlise comparativa: G-A e G-ADV. Na primeira versÃo, AdjAdvs sÃo analisados como categoria adjetival, enquanto na segunda sÃo analisados como categoria adverbial. A implementaÃÃo de G-A e G-ADV Ã avaliada pela aplicaÃÃo de um analisador sintÃtico automÃtico (parser) a 168 sentenÃas gramaticais e 286 sentenÃas agramaticais. ApÃs os testes nos conjuntos de sentenÃas gramaticais e agramaticais, os resultados de processamento das gramÃticas G-A e G-ADV no software XLE e a anÃlise estatÃstica com base no teste de variÃncia de fator duplo, chegamos Ã conclusÃo de que nÃo hÃ diferenÃa significativa no tratamento sintÃtico entre as versÃes G-A e G-ADV construÃdas para analisar AdjAdvs. Esse resultado reforÃa o argumento de Radford (1988) de que adjetivos e advÃrbios pertencem a uma Ãnica categoria. LinguÃstica Computacional AnÃlise sintÃtica automÃtica profunda GramÃticaLÃxico-Funcional LFG-XLE Adjetivos adverbializados. Computational Linguistics Deep syntactic parsing Lexical-Functional Grammar LFG-XLE Adverbialized adjectives LINGUISTICA
9	The application of constraint rules to data-driven parsing Jaf, Sardar January 2015 (has links) The process of determining the structural relationships between words in both natural and machine languages is known as parsing. Parsers are used as core components in a number of Natural Language Processing (NLP) applications such as online tutoring applications, dialogue-based systems and textual entailment systems. They have been used widely in the development of machine languages. In order to understand the way parsers work, we will investigate and describe a number of widely used parsing algorithms. These algorithms have been utilised in a range of different contexts such as dependency frameworks and phrase structure frameworks. We will investigate and describe some of the fundamental aspects of each of these frameworks, which can function in various ways including grammar-driven approaches and data-driven approaches. Grammar-driven approaches use a set of grammatical rules for determining the syntactic structures of sentences during parsing. Data-driven approaches use a set of parsed data to generate a parse model which is used for guiding the parser during the processing of new sentences. A number of state-of-the-art parsers have been developed that use such frameworks and approaches. We will briefly highlight some of these in this thesis. There are three specific important features that it is important to integrate into the development of parsers. These are efficiency, accuracy, and robustness. Efficiency is concerned with the use of as little time and computing resources as possible when processing natural language text. Accuracy involves maximising the correctness of the analyses that a parser produces. Robustness is a measure of a parser’s ability to cope with grammatically complex sentences and produce analyses of a large proportion of a set of sentences. In this thesis, we present a parser that can efficiently, accurately, and robustly parse a set of natural language sentences. Additionally, the implementation of the parser presented here allows for some trading-off between different levels of parsing performance. For example, some NLP applications may emphasise efficiency/robustness over accuracy while some other NLP systems may require a greater focus on accuracy. In dialogue-based systems, it may be preferable to produce a correct grammatical analysis of a question, rather than incorrectly analysing the grammatical structure of a question or quickly producing a grammatically incorrect answer for a question. Alternatively, it may be desirable that document translation systems translate a document into a different language quickly but less accurately, rather than slowly but highly accurately, because users may be able to correct grammatically incorrect sentences manually if necessary. The parser presented here is based on data-driven approaches but we will allow for the application of constraint rules to it in order to improve its performance. 006.3
10	Induction, Training, and Parsing Strategies beyond Context-free Grammars Gebhardt, Kilian 03 July 2020 (has links) This thesis considers the problem of assigning a sentence its syntactic structure, which may be discontinuous. It proposes a class of models based on probabilistic grammars that are obtained by the automatic refinement of a given grammar. Different strategies for parsing with a refined grammar are developed. The induction, refinement, and application of two types of grammars (linear context-free rewriting systems and hybrid grammars) are evaluated empirically on two German and one Dutch corpus. info:eu-repo/classification/ddc/004 ddc:004

Search results