Global ETD Search

111	Systémy syntaktických analyzátorů / Parser Systems Hrstka, Jan January 2019 (has links) This thesis provides a summary of knowledge of grammar systems. The thesis proposes modifications of parallel oriented grammar systems to be usable in sequential parsing. Concept of grammar systems is extended to level of entire parsers, that are grouped into parsing system. Then the properties of these systems are examined. The aim of thesis is to introduce approaches to syntactic analysis based on grammar systems. Thesis is based on context-free methods of syntactic analysis, extending them and connecting them together. Great attention is dedicated to increase generative capacity of LL and LR parsing. There were created context-free structures within this thesis, which are capable to generate context-sensitive languages. This work also provides a simple recipe for implementation of these structures. We introduced generic concept of parsing, that enlarge generative power of conventional parsing methods. Using presented techniques it is possible to extend many of often used languages with context-sensitive elements, especially elements contradicting with pumping lemma.
112	Segmentation sémantique d'images fortement structurées et faiblement structurées / Semantic Segmentation of Highly Structured and Weakly Structured Images Gadde, Raghu Deep 30 June 2017 (has links) Cette thèse pour but de développer des méthodes de segmentation pour des scènes fortement structurées (ex. bâtiments et environnements urbains) ou faiblement structurées (ex. paysages ou objets naturels). En particulier, les images de bâtiments peuvent être décrites en termes d'une grammaire de formes, et une dérivation de cette grammaire peut être inférée pour obtenir une segmentation d'une image. Cependant, il est difficile et long d'écrire de telles grammaires. Pour répondre à ce problème, nous avons développé une nouvelle méthode qui permet d'apprendre automatiquement une grammaire à partir d'un ensemble d'images et de leur segmentation associée. Des expériences montrent que des grammaires ainsi apprises permettent une inférence plus rapide et produisent de meilleures segmentations. Nous avons également étudié une méthode basée sur les auto-contextes pour segmenter des scènes fortement structurées et notamment des images de bâtiments. De manière surprenante, même sans connaissance spécifique sur le type de scène particulier observé, nous obtenons des gains significatifs en qualité de segmentation sur plusieurs jeux de données. Enfin, nous avons développé une technique basée sur les réseaux de neurones convolutifs (CNN) pour segmenter des images de scènes faiblement structurées. Un filtrage adaptatif est effectué à l'intérieur même du réseau pour permettre des dépendances entre zones d'images distantes. Des expériences sur plusieurs jeux de données à grande échelle montrent là aussi un gain important sur la qualité de segmentation / The aim of this thesis is to develop techniques for segmenting strongly-structuredscenes (e.g. building images) and weakly-structured scenes (e.g. natural images). Buildingimages can naturally be expressed in terms of grammars and inference is performed usinggrammars to obtain the optimal segmentation. However, it is difficult and time consum-ing to write such grammars. To alleviate this problem, a novel method to automaticallylearn grammars from a given training set of image and ground-truth segmentation pairs isdeveloped. Experiments suggested that such learned grammars help in better and fasterinference. Next, the effect of using grammars for strongly structured scenes is explored.To this end, a very simple technique based on Auto-Context is used to segment buildingimages. Surprisingly, even with out using any domain specific knowledge, we observedsignificant improvements in terms of performance on several benchmark datasets. Lastly,a novel technique based on convolutional neural networks is developed to segment imageswithout any high-level structure. Image-adaptive filtering is performed within a CNN ar-chitecture to facilitate long-range connections. Experiments on different large scale bench-marks show significant improvements in terms of performance Grammar learning Facade parsing Facade segmentation Aemantic segmentation Cnn Auto-Context Grammar learning Facade parsing Facade segmentation Semantic segmentation Cnn Auto-Context
113	Pattern matching in compilers / Pattern matching in compilers Bílka, Ondřej January 2012 (has links) Title: Pattern matching in compilers Author: Ondřej Bílka Department: Department of Applied Mathematics Supervisor: Jan Hubička, Department of Applied Mathematics Abstract: In this thesis we develop tools for effective and flexible pattern matching. We introduce a new pattern matching system called amethyst. Amethyst is not only a generator of parsers of programming languages, but can also serve as an alternative to tools for matching regular expressions. Our framework also produces dynamic parsers. Its intended use is in the context of IDE (accurate syntax highlighting and error detection on the fly). Amethyst offers pattern matching of general data structures. This makes it a useful tool for implement- ing compiler optimizations such as constant folding, instruction scheduling, and dataflow analysis in general. The parsers produced are essentially top-down parsers. Linear time complexity is obtained by introducing the novel notion of structured grammars and reg- ularized regular expressions. Amethyst uses techniques known from compiler optimizations to produce effective parsers. Keywords: Packrat parsing, dynamic parsing, structured grammars, functional programming 1
114	[en] AN EXPERIMENTAL STUDY OF THE EARLY PROCESSING AT THE PHONETIC INTEFACE AND THE EARLY PARSING IN LANGUAGE ACQUISITION: THE ROLE OF FUNCTIONAL ELEMENTS / [pt] UM ESTUDO EXPERIMENTAL DO PROCESSAMENTO NA INTERFACE FÔNICA E DA ANÁLISE SINTÁTICA INICIAL: O PAPEL DE ELEMENTOS FUNCIONAIS NA AQUISIÇÃO DA LINGUAGEM TATIANA BAGETTI 07 February 2018 (has links) [pt] Este estudo tem como foco a passagem da percepção fônica para a representação morfofonológica de elementos funcionais, mais precisamente os afixos verbais, bem como a realização do parsing linguístico no processo inicial de aquisição da linguagem. Nesta tese a aquisição da linguagem é abordada em uma perspectiva psicolingüística de forma integrada com a Teoria Linguística Gerativa, em sua versão Minimalista (Corrêa, 2006). A hipótese que orienta este trabalho é a de que os elementos de classe fechada são percebidos pela criança de forma diferenciada e que há uma distinção entre a percepção inicial desses elementos no nível fonético/fonológico e a sua posterior representação morfofonológica. Esses elementos, uma vez representados em termos de categorias funcionais do léxico, contribuem para a realização do parsing lingüístico pela criança. Uma análise de histórias infantis demonstrou que determinantes e afixos tendem a ocorrer em fronteiras de frases fonológicas e que propriedades fonéticas, como o acento, podem favorecer a percepção inicial de afixos verbais. Foram conduzidos três experimentos, sendo que os dois primeiros fizeram uso da Técnica de Escuta Preferencial e o terceiro utilizou a Técnica de Fixação Preferencial do Olhar. O primeiro experimento teve como objetivo avaliar a sensibilidade das crianças (de 9 a 15 meses) adquirindo o Português Brasileiro a distinções fônicas que afetam o padrão silábico da língua, independentemente do ambiente morfológico em que estas ocorrem (afixos flexionais e raízes de Nomes). O segundo experimento visou a verificar se o ambiente morfológico (afixos flexionais e raízes de Nomes) afeta o modo como crianças (de 9 a 18 meses) adquirindo o PB percebem alterações fônicas que não afetam o padrão silábico, o que indicaria que estas percebem afixos verbais como uma classe morfológica. O terceiro experimento pretendeu verificar se as crianças com idades entre 17 e 23 meses realizam o parsing de enunciados linguísticos, diferenciando elementos lexicais homófonos em função da natureza de elementos funcionais. Os resultados sugerem que crianças com média de 11 meses são capazes de perceber alterações fônicas independentemente do ambiente morfofonológico em que estas ocorrem. Também foi constatado que crianças entre 9 a 12 meses percebem alterações fônicas que não afetam o padrão silábico da língua em elementos de classe fechada, sugerindo que esses elementos são percebidos pelas crianças como uma classe morfológica em uma fase inicial no processo de aquisição da linguagem. Além disso, foi verificado que as crianças com média de 21 meses respondem diferencialmente a palavras homófonas em classes gramaticais diferentes (Nome e Verbo), com base na distinção entre projeções mínimas e máximas do Determinante, no parsing linguístico. A marcação do afixo verbal não afeta o reconhecimento do verbo, mas formas marcadas em relação a tempo adicionam dificuldade à condução da tarefa. Os resultados encontrados são compatíveis com as hipóteses testadas e permitem que se reconstrua a passagem da percepção do estímulo linguístico em um nível fônico para a representação morfofonológica de elementos de classe fechada, e finalmente para a representação dos mesmos como elementos de categorias funcionais, os quais são essenciais na condução do parsing linguístico. / [en] This study focuses on the passage from speech perception to the morphophonological representation of functional elements, verbal affixes in particular, and on the early parsing of linguistic utterances in language acquisition. A psycholinguistic perspective to language acquisition is adopted together with a minimalist conception of language (Corrêa, 2006). The working hypothesis is that closed class elements are distinctively perceived by children initially at a phonetic/phonological level and subsequently at a morphophonological one. Their representation as functional elements at a later stage contributes to the parsing of linguistic utterances. An analysis of a set of tales for children has demonstrated that determiners and verbal affixes occur at the edges of phonological phrases and phonetic properties such as the accent may contribute to their early perception by children. Three experiments were conducted, the first two in Head-turn Paradigm and the latter in the Intermodal Preferential Looking paradigm. Experiment 1 aimed at assessing 9-15 month infants sensibility to phonetic distinctions in the linguistic stimulus, which affect the syllabic pattern of the language (Brazilian Portuguese), independently of the morphological context in which they occur (verbal affixes and nominal roots). Experiment 2 aimed at verifying whether these morphological contexts affect infants perception of phonetic alterations that do not affect the phonological pattern of the language. The perception of such distinctions in the verbal affixes, but not in the nominal roots, was considered to indicate sensibility to the morphophonological patterns of these closed class elements. The third experiment aimed at verifying the extent to which children by the age of 21 months would rely on functional information in the parsing of linguistic utterances, thereby ascribing different categorical features to homophonous words (nouns and verbs). The results of Experiment 1 suggest that 9-15 month infants do perceive phonetic alterations that affect the syllabic pattern of language, regardless of the morphological context in which they occur. The results of the Experiment 2 suggest that infants are sensitive to phonetic alterations that do not affect the syllabic pattern of the language by the end of their first year of life (9-12 months). The results of Experiment 3 suggest that take into account different syntactic projections of the determiner in ascribing homophonous words to different classes (noun and verb). These results also indicate that verbs are analyzed as such regardless of the type of morphological affix they present (marked or unmarked for Tense). However, tensed marked forms seem to add processing costs in the accomplishment of the task. These results are compatible with the hypotheses that guided the present thesis and enable a theory of language acquisition to reconstruct the passage from the phonetic perception of the linguistic stimulus to the morphophonological representation of closed class elements (verbal affixes) and from this level of representation to children s reliance on functional elements in the parsing of linguistic utterances. [pt] AQUISICAO DA LINGUAGEM [en] LANGUAGE ACQUISITION [pt] CATEGORIAS FUNCIONAIS [en] FUNCTIONAL CATEGORIES [pt] DETERMINANTES [en] DETERMINERS [en] PHONOLOGICAL BOOTSTRAPPING [pt] AFIXOS VERBAIS [en] VERBAL AFFIXES [pt] PARSING [en] PARSING
115	Towards less supervision in dependency parsing Mirroshandel, Seyedabolghasem 10 December 2015 (has links) Analyse probabiliste est l'un des domaines de recherche les plus attractives en langage naturel En traitement. Analyseurs probabilistes succès actuels nécessitent de grandes treebanks qui Il est difficile, prend du temps et coûteux à produire. Par conséquent, nous avons concentré notre l'attention sur des approches moins supervisés. Nous avons proposé deux catégories de solution: l'apprentissage actif et l'algorithme semi-supervisé. Stratégies d'apprentissage actives permettent de sélectionner les échantillons les plus informatives pour annotation. La plupart des stratégies d'apprentissage actives existantes pour l'analyse reposent sur la sélection phrases incertaines pour l'annotation. Nous montrons dans notre recherche, sur quatre différents langues (français, anglais, persan, arabe), que la sélection des phrases complètes ne sont pas une solution optimale et de proposer un moyen de sélectionner uniquement les sous-parties de phrases. Comme nos expériences ont montré, certaines parties des phrases ne contiennent aucune utiles information pour la formation d'un analyseur, et en se concentrant sur les sous-parties incertains des phrases est une solution plus efficace dans l'apprentissage actif. / Probabilistic parsing is one of the most attractive research areas in natural language processing. Current successful probabilistic parsers require large treebanks which are difficult, time consuming, and expensive to produce. Therefore, we focused our attention on less-supervised approaches. We suggested two categories of solution: active learning and semi-supervised algorithm. Active learning strategies allow one to select the most informative samples for annotation. Most existing active learning strategies for parsing rely on selecting uncertain sentences for annotation. We show in our research, on four different languages (French, English, Persian, and Arabic), that selecting full sentences is not an optimal solution and propose a way to select only subparts of sentences. As our experiments have shown, some parts of the sentences do not contain any useful information for training a parser, and focusing on uncertain subparts of the sentences is a more effective solution in active learning. Apprentissage semi-Supervisé Apprentissage actif La dépendance liée au traitement Cadres SUBCAT Contraintes sélectionnelles Parsing Semi-Supervised Learning Active Learning Dependency Parsing Subcat frames Selectional Constraints 004
116	[en] CORRESPONDENCE BETWEEN PEGS AND CLASSES OF CONTEXT-FREE GRAMMARS / [pt] CORRESPONDÊNCIA ENTRE PEGS E CLASSES DE GRAMÁTICAS LIVRES DE CONTEXTO SERGIO QUEIROZ DE MEDEIROS 31 January 2011 (has links) [pt] Gramáticas de Expressões de Parsing (PEGs) são um formalismo que permite descrever linguagens e que possui como característica distintiva o uso de um operador de escolha ordenada. A classe de linguagens descrita por PEGs contém propriamente todas as linguagens livres de contexto determinísticas. Nesta tese discutimos a correspondência de PEGs com dois outros formalismos usados para descrever linguagens: expressões regulares e Gramáticas Livres de Contexto (CFGs). Apresentamos uma formalização de expressões regulares usando semântica natural e mostramos uma transformação para converter expressões regulares em PEGs que descrevem a mesma linguagem; essa transformação pode ser facilmente adaptada para acomodar diversas extensões usadas por bibliotecas de expressões regulares (e.g., repetição preguiçosa e subpadrões independentes). Também apresentamos uma nova formalização de CFGs usando semântica natural e mostramos a correspondência entre CFGs lineares à direita e PEGs equivalentes. Além disso, mostramos que gramáticas LL(1) com uma pequena restrição descrevem a mesma linguagem quando interpretadas como CFGs e quando interpretadas como PEGs. Por fim, mostramos como transformar CFGs LL(k)-forte em PEGs equivalentes. / [en] Parsing Expression Grammars (PEGs) are a formalism that allow us to describe languages and that has as its distinguishing feature the use of an ordered choice operator. The class of languages described by PEGs properly contains all deterministic context-free languages. In this thesis we discuss the correspondence between PEGs and two other formalisms used to describe languages: regular expressions and Context-Free Grammars (CFGs). We present a new formalization of regular expressions that uses natural semantics and we show a transformation to convert a regular expression into a PEG that describes the same language; this transformation can be easily adapted to accommodate several extensions used by regular expression libraries (e.g., lazy repetition and independent subpatterns). We also present a new formalization of CFGs that uses natural semantics and we show the correspondence between right linear CFGs and equivalent PEGs. Moreover, we show that LL(1) grammars with a minor restriction define the same language when interpreted as a CFG and when interpreted as a PEG. Finally, we show how to transform strong-LL(k) CFGs into PEGs that are equivalent. [pt] INFORMATICA [en] COMPUTER SCIENCE [pt] LINGUAGENS DE PROGRAMACAO [en] PROGRAMMING LANGUAGES [pt] GRAMATICAS DE EXPRESSOES DE PARSING [en] PARSING EXPRESSION GRAMMARS [pt] GRAMATICAS LIVRES DE CONTEXTO [en] CONTEXT-FREE GRAMMARS
117	Gramatické systémy aplikované v syntaktické analýze / Grammar Systems Applied to Parsing Martiško, Jakub January 2015 (has links) This paper deals with different variants of grammar systems. Grammar systems combine the simplicity of Context Free Grammars with the generative power of more complex gammars. There are two main variants of grammar systems described in this paper: PC grammar systems and CD grammar systems. New type of grammar system, which is a modification of the CD grammar systems, is also described in the paper. New method of parsing, based on this new grammar system is proposed in the paper. This new parser consists of several smaller parsers, which work in both top down and bottom up way.
118	Unsupervised Natural Language Processing for Knowledge Extraction from Domain-specific Textual Resources Hänig, Christian 17 April 2013 (has links) This thesis aims to develop a Relation Extraction algorithm to extract knowledge out of automotive data. While most approaches to Relation Extraction are only evaluated on newspaper data dealing with general relations from the business world their applicability to other data sets is not well studied. Part I of this thesis deals with theoretical foundations of Information Extraction algorithms. Text mining cannot be seen as the simple application of data mining methods to textual data. Instead, sophisticated methods have to be employed to accurately extract knowledge from text which then can be mined using statistical methods from the field of data mining. Information Extraction itself can be divided into two subtasks: Entity Detection and Relation Extraction. The detection of entities is very domain-dependent due to terminology, abbreviations and general language use within the given domain. Thus, this task has to be solved for each domain employing thesauri or another type of lexicon. Supervised approaches to Named Entity Recognition will not achieve reasonable results unless they have been trained for the given type of data. The task of Relation Extraction can be basically approached by pattern-based and kernel-based algorithms. The latter achieve state-of-the-art results on newspaper data and point out the importance of linguistic features. In order to analyze relations contained in textual data, syntactic features like part-of-speech tags and syntactic parses are essential. Chapter 4 presents machine learning approaches and linguistic foundations being essential for syntactic annotation of textual data and Relation Extraction. Chapter 6 analyzes the performance of state-of-the-art algorithms of POS tagging, syntactic parsing and Relation Extraction on automotive data. The findings are: supervised methods trained on newspaper corpora do not achieve accurate results when being applied on automotive data. This is grounded in various reasons. Besides low-quality text, the nature of automotive relations states the main challenge. Automotive relation types of interest (e. g. component – symptom) are rather arbitrary compared to well-studied relation types like is-a or is-head-of. In order to achieve acceptable results, algorithms have to be trained directly on this kind of data. As the manual annotation of data for each language and data type is too costly and inflexible, unsupervised methods are the ones to rely on. Part II deals with the development of dedicated algorithms for all three essential tasks. Unsupervised POS tagging (Chapter 7) is a well-studied task and algorithms achieving accurate tagging exist. All of them do not disambiguate high frequency words, only out-of-lexicon words are disambiguated. Most high frequency words bear syntactic information and thus, it is very important to differentiate between their different functions. Especially domain languages contain ambiguous and high frequent words bearing semantic information (e. g. pump). In order to improve POS tagging, an algorithm for disambiguation is developed and used to enhance an existing state-of-the-art tagger. This approach is based on context clustering which is used to detect a word type’s different syntactic functions. Evaluation shows that tagging accuracy is raised significantly. An approach to unsupervised syntactic parsing (Chapter 8) is developed in order to suffice the requirements of Relation Extraction. These requirements include high precision results on nominal and prepositional phrases as they contain the entities being relevant for Relation Extraction. Furthermore, accurate shallow parsing is more desirable than deep binary parsing as it facilitates Relation Extraction more than deep parsing. Endocentric and exocentric constructions can be distinguished and improve proper phrase labeling. unsuParse is based on preferred positions of word types within phrases to detect phrase candidates. Iterating the detection of simple phrases successively induces deeper structures. The proposed algorithm fulfills all demanded criteria and achieves competitive results on standard evaluation setups. Syntactic Relation Extraction (Chapter 9) is an approach exploiting syntactic statistics and text characteristics to extract relations between previously annotated entities. The approach is based on entity distributions given in a corpus and thus, provides a possibility to extend text mining processes to new data in an unsupervised manner. Evaluation on two different languages and two different text types of the automotive domain shows that it achieves accurate results on repair order data. Results are less accurate on internet data, but the task of sentiment analysis and extraction of the opinion target can be mastered. Thus, the incorporation of internet data is possible and important as it provides useful insight into the customer\''s thoughts. To conclude, this thesis presents a complete unsupervised workflow for Relation Extraction – except for the highly domain-dependent Entity Detection task – improving performance of each of the involved subtasks compared to state-of-the-art approaches. Furthermore, this work applies Natural Language Processing methods and Relation Extraction approaches to real world data unveiling challenges that do not occur in high quality newspaper corpora. info:eu-repo/classification/ddc/500 ddc:500
119	Accurately extracting information from a finite set of different report categories and formats / Precis extraktion av information från ett begränsat antal rapporter med olika struktur och format på datan Holmbäck, Jonatan January 2023 (has links) POC Sports (hereafter simply POC) is a company that manufactures gear and accessories for winter sports as well as cycling. Their mission is to “Protect lives and reduce the consequences of accidents for athletes and anyone inspired to be one”. To do so, a lot of care needs to be put into making their equipment as protective as possible, while still maintaining the desired functionality. To aid in this, their vendor companies run standardized tests to evaluate their products. The results of these tests are then compiled into a report for POC. The problem is that the different companies use different styles and formats to convey this information, which can be classified into different categories. Therefore, this project aimed to provide a tool that can be used by POC to identify the report’s category and then accurately extract relevant data from it. An accuracy score was used as the metric to evaluate the tool’s accuracy with respect to extracting the relevant data. The development and evaluation of the tool were performed in two evaluation rounds. Additional metrics were used to evaluate a number of existing tools. These metrics included: whether the tools were open source, how easy they are to set up, pricing, and how much of the task the tool could cover. A proof of concept tool was realized and it demonstrated an accuracy of 97%. This was considered adequate when compared to the minimum required accuracy of 95%. However, due to the available time and resources, the sample size was limited, and thus this accuracy may not apply to the entire population with a confidence level higher than 75%. The results of evaluating the iterative improvements in the tool suggest that it is possible by addressing issues as they are found to achieve an acceptable score for a large fraction of the general population. Additionally, it would be beneficial to keep a catalog of the recurring solutions that have been made for different problems, so they can be reused for similar problems, allowing for better extensibility and generalizability. To build on the work performed in this thesis, the next steps might be to look into similar problems for other formats and to examine how different PDF generators may affect the ability to extract and process data present in PDF reports. / POC är ett företag som tillverkar utrustning, i synnerhet hjälmar, för vintersport och cyklister. Deras mål är att “Skydda liv och minska konsekvenserna från olyckor för atleter och vem som helst som är inspirerad till att bli en sådan”. För att uppnå detta har mycket jobb lagts ner för att göra deras utrustning så skyddande som möjligt., men samtidigt bibehålla samma funktionalitet. För att bidra med detta har POCs säljare genomfört standardiserade tester för att evaluera om deras produkter håller upp till standardena som satts på dem. Resultaten från dessa test är ofta presenterade i form av en rapport som sedan skickas till POC. Problemet är att de olika säljarna använder olika sätt och även format för att presentera den här informationen, som kan klassifieras in till olika kategorier. Därför avser det här projektet att skapa ett verktyg som kan användas av POC för att identifiera och därefter extrahera datan från dessa rapporter. Ett precisionsspoäng användes som mått för att utvärdera verktygets precision med avseende på att extrahera relevant data. Utvecklingen och utvärderingen av verktyget genomfördes i två utvärderingsomgångar. Ytterligare mått användes för att utvärdera ett antal befintliga verktyg. Dessa mått inkluderade: om verktygen var öppen källkod, hur enkla de är att installera och bröja använda, prissättning och hur mycket av uppgiften verktyget kunde täcka. En prototype utvecklades med en precision på 97%. Detta ansågs vara tillräckligt jämfört med den minsta nödvändiga precision på 95%. Men på grund av den tillgängliga tiden och resurserna var urvalsstorleken begränsad, och därför kanske denna noggrannhet inte gäller för hela populationen med en konfidensnivå högre än 75%. Resultaten av utvärderingen av de iterativa förbättringarna i verktyget tyder på att det är möjligt att genom att ta itu med problem som dyker upp, att uppnå en acceptabel poäng för en stor del av den allmänna befolkningen. Dessutom skulle det vara fördelaktigt att föra en katalog över de återkommande lösningar som har gjorts för olika problem, så att de kan återanvändas för liknande problem, vilket möjliggör bättre töjbarhet och generaliserbarhet. För att bygga vidare på det arbete som utförts i denna avhandling kan nästa steg vara att undersöka liknande problem för andra format och att undersöka hur olika PDF-generatorer kan påverka hur väl det går att extrahera och bearbeta data som finns i PDF-rapporter. Text Extraction PDF Excel Text Parsing Data Analysis Text Extrahering PDF Excel Text Parsing Data Analys Computer and Information Sciences Data- och informationsvetenskap
120	Tree Transformations in Inductive Dependency Parsing Nilsson, Jens January 2007 (has links) <p>This licentiate thesis deals with automatic syntactic analysis, or parsing, of natural languages. A parser constructs the syntactic analysis, which it learns by looking at correctly analyzed sentences, known as training data. The general topic concerns manipulations of the training data in order to improve the parsing accuracy.</p><p>Several studies using constituency-based theories for natural languages in such automatic and data-driven syntactic parsing have shown that training data, annotated according to a linguistic theory, often needs to be adapted in various ways in order to achieve an adequate, automatic analysis. A linguistically sound constituent structure is not necessarily well-suited for learning and parsing using existing data-driven methods. Modifications to the constituency-based trees in the training data, and corresponding modifications to the parser output, have successfully been applied to increase the parser accuracy. The topic of this thesis is to investigate whether similar modifications in the form of tree transformations to training data, annotated with dependency-based structures, can improve accuracy for data-driven dependency parsers. In order to do this, two types of tree transformations are in focus in this thesis.</p><p>%This is a topic that so far has been less studied.</p><p>The first one concerns non-projectivity. The full potential of dependency parsing can only be realized if non-projective constructions are allowed, which pose a problem for projective dependency parsers. On the other hand, non-projective parsers tend, among other things, to be slower. In order to maintain the benefits of projective parsing, a tree transformation technique to recover non-projectivity while using a projective parser is presented here.</p><p>The second type of transformation concerns linguistic phenomena that are possible but hard for a parser to learn, given a certain choice of dependency analysis. This study has concentrated on two such phenomena, coordination and verb groups, for which tree transformations are applied in order to improve parsing accuracy, in case the original structure does not coincide with a structure that is easy to learn.</p><p>Empirical evaluations are performed using treebank data from various languages, and using more than one dependency parser. The results show that the benefit of these tree transformations used in preprocessing and postprocessing to a large extent is language, treebank and parser independent.</p> Inductive Dependency Parsing Dependency Structure Tree Transformation Non-projectivity Coordination Verb Group Language technology Språkteknologi

Search results