Global ETD Search

51	Regulated Grammar Systems / Regulated Grammar Systems Tomko, Martin January 2018 (has links) Práce poskytuje přehled základů teorie formálních jazyků, regulovaných gramatik a analýzy LL(1) jazyků. Je zde navržen a analyzován algoritmus pro analýzu programovaných gramatik, inspirován LL(1) analyzátorem. Třída jazyků přijímaná tímto algoritmem je striktní nadtřídou LL(1) jazyků, obsahující některé jazyky, které nejsou bezkontextové. Tato třída se však jeví být neporovnatelná s třídou bezkontextových jazyků.
52	Attention Mechanisms for Transition-based Dependency Parsing Gontrum, Johannes January 2019 (has links) Transition-based dependency parsing is known to compute the syntactic structure of a sentence efficiently, but is less accurate to predict long-distance relations between tokens as it lacks global information about the sentence. Our main contribution is the integration of attention mechanisms to replace the static token selection with a dynamic approach that takes the complete sequence into account. Though our experiments confirm that our approach fundamentally works, our models do not outperform the baseline parser. We further present a line of follow-up experiments to investigate these results. Our main conclusion is that the BiLSTM of the traditional parser is already powerful enough to encode the required global information into each token, eliminating the need for an attention-driven approach. Our secondary results indicate that the attention models require a neural network with a higher capacity to potentially extract more latent information from the word embeddings and the LSTM than the traditional parser. We further show that positional encodings are not useful for our attention models, though BERT-style positional embeddings slightly improve the results. Finally, we experiment with replacing the LSTM with a Transformer-encoder to test the impact of self-attention. The results are disappointing, though we think that more future research should be dedicated to this. For our work, we implement a UUParser-inspired dependency parser from scratch in PyTorch and extend it with, among other things, full GPU support and mini-batch processing. We publish the code under a permissive open source license at https://github.com/jgontrum/parseridge. natural language processing language technology dependency parsing transition-based parsing parsing attention Computer Systems Datorsystem
53	Interpret jazyka LaTeX založený na kombinaci více metod syntaktické analýzy / A LaTex Interpreter Based on a Combination of Several Parsing Methods Lebeda, Petr January 2008 (has links) The diploma thesis discusses the potential of interpretation of typographical language LATEX and describes the structure of this language, its functions and their syntax. Also it analyses possibilities of LaTeX interpretation into HTML (HyperText Markup Language), in order to create typographically accurate publications, which could be viewed by common web browser. The solution concept and outlines of possible problems follows.
54	On repairing sentences : an experimental and computational analysis of recovery from unexpected syntactic disambiguation in sentence parsing Green, Matthew James January 2013 (has links) This thesis contends that the human parser has a repair mechanism. It is further contended that the human parser uses this mechanism to alter previously built structure in the case of unexpected disambiguation of temporary syntactic ambiguity. This position stands in opposition to the claim that unexpected disambiguation of temporary syntactic ambiguity is accomplished by the usual first pass parsing routines, a claim that arises from the relatively extraordinary capabilities of computational parsers, capabilities which have recently been extended by hypothesis to be available to the human sentence processing mechanism. The thesis argues that, while these capabilities have been demonstrated in computational parsers, the human parser is best explained in the terms of a repair based framework, and that this argument is demonstrated by examining eye movement behaviour in reading. In support of the thesis, evidence is provided from a set of eye tracking studies of reading. It is argued that these studies show that eye movement behaviours at disambiguation include purposeful visual search for linguistically relevant material, and that the form and structure of these searches vary reliably according to the nature of the repairs that the sentences necessitate. 150
55	Scalable semi-supervised grammar induction using cross-linguistically parameterized syntactic prototypes Boonkwan, Prachya January 2014 (has links) This thesis is about the task of unsupervised parser induction: automatically learning grammars and parsing models from raw text. We endeavor to induce such parsers by observing sequences of terminal symbols. We focus on overcoming the problem of frequent collocation that is a major source of error in grammar induction. For example, since a verb and a determiner tend to co-occur in a verb phrase, the probability of attaching the determiner to the verb is sometimes higher than that of attaching the core noun to the verb, resulting in erroneous attachment *((Verb Det) Noun) instead of (Verb (Det Noun)). Although frequent collocation is the heart of grammar induction, it is precariously capable of distorting the grammar distribution. Natural language grammars follow a Zipfian (power law) distribution, where the frequency of any grammar rule is inversely proportional to its rank in the frequency table. We believe that covering the most frequent grammar rules in grammar induction will have a strong impact on accuracy. We propose an efficient approach to grammar induction guided by cross-linguistic language parameters. Our language parameters consist of 33 parameters of frequent basic word orders, which are easy to be elicited from grammar compendiums or short interviews with naïve language informants. These parameters are designed to capture frequent word orders in the Zipfian distribution of natural language grammars, while the rest of the grammar including exceptions can be automatically induced from unlabeled data. The language parameters shrink the search space of the grammar induction problem by exploiting both word order information and predefined attachment directions. The contribution of this thesis is three-fold. (1) We show that the language parameters are adequately generalizable cross-linguistically, as our grammar induction experiments will be carried out on 14 languages on top of a simple unsupervised grammar induction system. (2) Our specification of language parameters improves the accuracy of unsupervised parsing even when the parser is exposed to much less frequent linguistic phenomena in longer sentences when the accuracy decreases within 10%. (3) We investigate the prevalent factors of errors in grammar induction which will provide room for accuracy improvement. The proposed language parameters efficiently cope with the most frequent grammar rules in natural languages. With only 10 man-hours for preparing syntactic prototypes, it improves the accuracy of directed dependency recovery over the state-ofthe- art Gillenwater et al.’s (2010) completely unsupervised parser in: (1) Chinese by 30.32% (2) Swedish by 28.96% (3) Portuguese by 37.64% (4) Dutch by 15.17% (5) German by 14.21% (6) Spanish by 13.53% (7) Japanese by 13.13% (8) English by 12.41% (9) Czech by 9.16% (10) Slovene by 7.24% (11) Turkish by 6.72% and (12) Bulgarian by 5.96%. It is noted that although the directed dependency accuracies of some languages are below 60%, their TEDEVAL scores are still satisfactory (approximately 80%). This suggests us that our parsed trees are, in fact, closely related to the gold-standard trees despite the discrepancy of annotation schemes. We perform an error analysis of over- and under-generation analysis. We found three prevalent problems that cause errors in the experiments: (1) PP attachment (2) discrepancies of dependency annotation schemes and (3) rich morphology. The methods presented in this thesis were originally presented in Boonkwan and Steedman (2011). The thesis presents a great deal more detail in the design of crosslinguistic language parameters, the algorithm of lexicon inventory construction, experiment results, and error analysis. 006.3
56	Harmonic analysis of music using combinatory categorial grammar Granroth-Wilding, Mark Thomas January 2013 (has links) Various patterns of the organization of Western tonal music exhibit hierarchical structure, among them the harmonic progressions underlying melodies and the metre underlying rhythmic patterns. Recognizing these structures is an important part of unconscious human cognitive processing of music. Since the prosody and syntax of natural languages are commonly analysed with similar hierarchical structures, it is reasonable to expect that the techniques used to identify these structures automatically in natural language might also be applied to the automatic interpretation of music. In natural language processing (NLP), analysing the syntactic structure of a sentence is prerequisite to semantic interpretation. The analysis is made difficult by the high degree of ambiguity in even moderately long sentences. In music, a similar sort of structural analysis, with a similar degree of ambiguity, is fundamental to tasks such as key identification and score transcription. These and other tasks depend on harmonic and rhythmic analyses. There is a long history of applying linguistic analysis techniques to musical analysis. In recent years, statistical modelling, in particular in the form of probabilistic models, has become ubiquitous in NLP for large-scale practical analysis of language. The focus of the present work is the application of statistical parsing to automatic harmonic analysis of music. This thesis demonstrates that statistical parsing techniques, adapted from NLP with little modification, can be successfully applied to recovering the harmonic structure underlying music. It shows first how a type of formal grammar based on one used for linguistic syntactic processing, Combinatory Categorial Grammar (CCG), can be used to analyse the hierarchical structure of chord sequences. I introduce a formal language similar to first-order predicate logical to express the hierarchical tonal harmonic relationships between chords. The syntactic grammar formalism then serves as a mechanism to map an unstructured chord sequence onto its structured analysis. In NLP, the high degree of ambiguity of the analysis means that a parser must consider a huge number of possible structures. Chart parsing provides an efficient mechanism to explore them. Statistical models allow the parser to use information about structures seen before in a training corpus to eliminate improbable interpretations early on in the process and to rank the final analyses by plausibility. To apply the same techniques to harmonic analysis of chord sequences, a corpus of tonal jazz chord sequences annotated by hand with harmonic analyses is constructed. Two statistical parsing techniques are adapted to the present task and evaluated on their success at recovering the annotated structures. The experiments show that parsing using a statistical model of syntactic derivations is more successful than a Markovian baseline model at recovering harmonic structure. In addition, the practical technique of statistical supertagging serves to speed up parsing without any loss in accuracy. This approach to recovering harmonic structure can be extended to the analysis of performance data symbolically represented as notes. Experiments using some simple proof-of-concept extensions of the above parsing models demonstrate one probabilistic approach to this. The results reported provide a baseline for future work on the task of harmonic analysis of performances.
57	Spectral Probablistic Modeling and Applications to Natural Language Processing Parikh, Ankur 01 August 2015 (has links) Probabilistic modeling with latent variables is a powerful paradigm that has led to key advances in many applications such natural language processing, text mining, and computational biology. Unfortunately, while introducing latent variables substantially increases representation power, learning and modeling can become considerably more complicated. Most existing solutions largely ignore non-identifiability issues in modeling and formulate learning as a nonconvex optimization problem, where convergence to the optimal solution is not guaranteed due to local minima. In this thesis, we propose to tackle these problems through the lens of linear/multi-linear algebra. Viewing latent variable models from this perspective allows us to approach key problems such as structure learning and parameter learning using tools such as matrix/tensor decompositions, inversion, and additive metrics. These new tools enable us to develop novel solutions to learning in latent variable models with theoretical and practical advantages. For example, our spectral parameter learning methods for latent trees and junction trees are provably consistent, local-optima-free, and 1-2 orders of magnitude faster thanEMfor large sample sizes. In addition, we focus on applications in Natural Language Processing, using our insights to not only devise new algorithms, but also to propose new models. Our method for unsupervised parsing is the first algorithm that has both theoretical guarantees and is also practical, performing favorably to theCCMmethod of Klein and Manning. We also developed power low rank ensembles, a framework for language modeling that generalizes existing n-gram techniques to non-integer n. It consistently outperforms state-of-the-art Kneser Ney baselines and can train on billion-word datasets in a few hours. probabilistic graphical models spectral methods kernels unsupervised parsing language modeling
58	Robust French syntax analysis : reconciling statistical methods and linguistic knowledge in the Talismane toolkit / Analyse syntaxique robuste du français : concilier méthodes statistiques et connaissances linguistiques dans l'outil Talismane Urieli, Assaf 17 December 2013 (has links) Dans cette thèse, nous explorons l'analyse syntaxique robuste statistique du français. Notre principal souci est de trouver des méthodes qui permettent au linguiste d'injecter des connaissances et/ou des ressources linguistiques dans un moteur statistique afin d'améliorer les résultats de certains phénomènes spécifiques. D'abord nous décrivons le schéma d'annotation en dépendances du français, et les algorithmes capables de produire cette annotation, en particulier le parsing par transitions. Après avoir exploré les algorithmes d'apprentissage automatique supervisé pour les problèmes de classification en TAL, nous présentons l'analyseur syntaxique Talismane développé dans le cadre de cette thèse et comprenant quatre modules statistiques – le découpage en phrases, la segmentation en mots, l'étiquetage morpho-syntaxique et le parsing – ainsi que les diverses ressources linguistiques utilisées par le modèle de base. Nos premières expériences tentent d'identifier la meilleure configuration de base parmi de nombreuses configurations possibles. Ensuite nous explorons les améliorations apportées par la recherche par faisceau et la propagation du faisceau. Enfin nous présentons une série d'expériences dont le but est de corriger des erreurs linguistiques spécifiques au moyen de traits ciblés. Une de nos innovations est l'introduction des règles qui imposent ou interdisent certaines décisions locales, permettant ainsi de contourner le modèle statistique. Nous explorons l'utilisation de règles pour les erreurs que les traits n'ont pu corriger. Finalement, nous présentons une expérience semi-supervisée avec une ressource de sémantique distributionnelle. / In this thesis we explore robust statistical syntax analysis for French. Our main concern is to explore methods whereby the linguist can inject linguistic knowledge and/or resources into the robust statistical engine in order to improve results for specific phenomena. We first explore the dependency annotation schema for French, concentrating on certain phenomena. Next, we look into the various algorithms capable of producing this annotation, and in particular on the transition-based parsing algorithm used in the rest of this thesis. After exploring supervised machine learning algorithms for NLP classification problems, we present the Talismane toolkit for syntax analysis, built within the framework of this thesis, including four statistical modules - sentence boundary detection, tokenisation, pos-tagging and parsing - as well as the various linguistic resources used for the baseline model, including corpora, lexicons and feature sets. Our first experiments attempt various machine learning configurations in order to identify the best baseline. We then look into improvements made possible by beam search and beam propagation. Finally, we present a series of experiments aimed at correcting errors related to specific linguistic phenomena, using targeted features. One our innovation is the introduction of rules that can impose or prohibit certain decisions locally, thus bypassing the statistical model. We explore the usage of rules for errors that the features are unable to correct. Finally, we look into the enhancement of targeted features by large scale linguistic resources, and in particular a semi-supervised approach using a distributional semantic resource. Analyse syntaxique Apprentissage automatique Parsing Machine learning Targeted features
59	Hybrid tag-set for natural language processing. January 1999 (has links) Leung Wai Kwong. / Thesis (M.Phil.)--Chinese University of Hong Kong, 1999. / Includes bibliographical references (leaves 90-95). / Abstracts in English and Chinese. / Chapter 1 --- Introduction --- p.1 / Chapter 1.1 --- Motivation --- p.1 / Chapter 1.2 --- Objective --- p.3 / Chapter 1.3 --- Organization of thesis --- p.3 / Chapter 2 --- Background --- p.5 / Chapter 2.1 --- Chinese Noun Phrases Parsing --- p.5 / Chapter 2.2 --- Chinese Noun Phrases --- p.6 / Chapter 2.3 --- Problems with Syntactic Parsing --- p.11 / Chapter 2.3.1 --- Conjunctive Noun Phrases --- p.11 / Chapter 2.3.2 --- De-de Noun Phrases --- p.12 / Chapter 2.3.3 --- Compound Noun Phrases --- p.13 / Chapter 2.4 --- Observations --- p.15 / Chapter 2.4.1 --- Inadequacy in Part-of-Speech Categorization for Chi- nese NLP --- p.16 / Chapter 2.4.2 --- The Need of Semantic in Noun Phrase Parsing --- p.17 / Chapter 2.5 --- Summary --- p.17 / Chapter 3 --- Hybrid Tag-set --- p.19 / Chapter 3.1 --- Objectives --- p.19 / Chapter 3.1.1 --- Resolving Parsing Ambiguities --- p.19 / Chapter 3.1.2 --- Investigation of Nominal Compound Noun Phrases --- p.20 / Chapter 3.2 --- Definition of Hybrid Tag-set --- p.20 / Chapter 3.3 --- Introduction to Cilin --- p.21 / Chapter 3.4 --- Problems with Cilin --- p.23 / Chapter 3.4.1 --- Unknown words --- p.23 / Chapter 3.4.2 --- Multiple Semantic Classes --- p.25 / Chapter 3.5 --- Introduction to Chinese Word Formation --- p.26 / Chapter 3.5.1 --- Disyllabic Word Formation --- p.26 / Chapter 3.5.2 --- Polysyllabic Word Formation --- p.28 / Chapter 3.5.3 --- Observation --- p.29 / Chapter 3.6 --- Automatic Assignment of Hybrid Tag to Chinese Word --- p.31 / Chapter 3.7 --- Summary --- p.34 / Chapter 4 --- Automatic Semantic Assignment --- p.35 / Chapter 4.1 --- Previous Researches on Semantic Tagging --- p.36 / Chapter 4.2 --- SAUW - Automatic Semantic Assignment of Unknown Words --- p.37 / Chapter 4.2.1 --- POS-to-SC Association (Process 1) --- p.38 / Chapter 4.2.2 --- Morphology-based Deduction (Process 2) --- p.39 / Chapter 4.2.3 --- Di-syllabic Word Analysis (Process 3 and 4) --- p.41 / Chapter 4.2.4 --- Poly-syllabic Word Analysis (Process 5) --- p.47 / Chapter 4.3 --- Illustrative Examples --- p.47 / Chapter 4.4 --- Evaluation and Analysis --- p.49 / Chapter 4.4.1 --- Experiments --- p.49 / Chapter 4.4.2 --- Error Analysis --- p.51 / Chapter 4.5 --- Summary --- p.52 / Chapter 5 --- Word Sense Disambiguation --- p.53 / Chapter 5.1 --- Introduction to Word Sense Disambiguation --- p.54 / Chapter 5.2 --- Previous Works on Word Sense Disambiguation --- p.55 / Chapter 5.2.1 --- Linguistic-based Approaches --- p.56 / Chapter 5.2.2 --- Corpus-based Approaches --- p.58 / Chapter 5.3 --- Our Approach --- p.60 / Chapter 5.3.1 --- Bi-gram Co-occurrence Probabilities --- p.62 / Chapter 5.3.2 --- Tri-gram Co-occurrence Probabilities --- p.63 / Chapter 5.3.3 --- Design consideration --- p.65 / Chapter 5.3.4 --- Error Analysis --- p.67 / Chapter 5.4 --- Summary --- p.68 / Chapter 6 --- Hybrid Tag-set for Chinese Noun Phrase Parsing --- p.69 / Chapter 6.1 --- Resolving Ambiguous Noun Phrases --- p.70 / Chapter 6.1.1 --- Experiment --- p.70 / Chapter 6.1.2 --- Results --- p.72 / Chapter 6.2 --- Summary --- p.78 / Chapter 7 --- Conclusion --- p.80 / Chapter 7.1 --- Summary --- p.80 / Chapter 7.2 --- Difficulties Encountered --- p.83 / Chapter 7.2.1 --- Lack of Training Corpus --- p.83 / Chapter 7.2.2 --- Features of Chinese word formation --- p.84 / Chapter 7.2.3 --- Problems with linguistic sources --- p.85 / Chapter 7.3 --- Contributions --- p.86 / Chapter 7.3.1 --- Enrichment to the Cilin --- p.86 / Chapter 7.3.2 --- Enhancement in syntactic parsing --- p.87 / Chapter 7.4 --- Further Researches --- p.88 / Chapter 7.4.1 --- Investigation into words that undergo semantic changes --- p.88 / Chapter 7.4.2 --- Incorporation of more information into the hybrid tag-set --- p.89 / Chapter A --- POS Tag-set by Tsinghua University (清華大學） --- p.96 / Chapter B --- Morphological Rules --- p.100 / Chapter C --- Syntactic Rules for Di-syllabic Words Formation --- p.104 Parsing (Computer grammar) Computational linguistics Chinese language--Data processing
60	GLR parsing with multiple grammars for natural language queries. January 2000 (has links) Luk Po Chui. / Thesis (M.Phil.)--Chinese University of Hong Kong, 2000. / Includes bibliographical references (leaves 97-100). / Abstracts in English and Chinese. / Chapter 1 --- Introduction --- p.1 / Chapter 1.1 --- Efficiency and Memory --- p.2 / Chapter 1.2 --- Ambiguity --- p.3 / Chapter 1.3 --- Robustness --- p.4 / Chapter 1.4 --- Thesis Organization --- p.5 / Chapter 2 --- Background --- p.7 / Chapter 2.1 --- Introduction --- p.7 / Chapter 2.2 --- Context-Free Grammars --- p.8 / Chapter 2.3 --- The LR Parsing Algorithm --- p.9 / Chapter 2.4 --- The Generalized LR Parsing Algorithm --- p.12 / Chapter 2.4.1 --- Graph-Structured Stack --- p.12 / Chapter 2.4.2 --- Packed Shared Parse Forest --- p.14 / Chapter 2.5 --- Time and Space Complexity --- p.16 / Chapter 2.6 --- Related Work on Parsing --- p.17 / Chapter 2.6.1 --- GLR* --- p.17 / Chapter 2.6.2 --- TINA --- p.18 / Chapter 2.6.3 --- PHOENIX --- p.19 / Chapter 2.7 --- Chapter Summary --- p.21 / Chapter 3 --- Grammar Partitioning --- p.22 / Chapter 3.1 --- Introduction --- p.22 / Chapter 3.2 --- Motivation --- p.22 / Chapter 3.3 --- Previous Work on Grammar Partitioning --- p.24 / Chapter 3.4 --- Our Grammar Partitioning Approach --- p.26 / Chapter 3.4.1 --- Definitions and Concepts --- p.26 / Chapter 3.4.2 --- Guidelines for Grammar Partitioning --- p.29 / Chapter 3.5 --- An Example --- p.30 / Chapter 3.6 --- Chapter Summary --- p.34 / Chapter 4 --- Parser Composition --- p.35 / Chapter 4.1 --- Introduction --- p.35 / Chapter 4.2 --- GLR Lattice Parsing --- p.36 / Chapter 4.2.1 --- Lattice with Multiple Granularity --- p.36 / Chapter 4.2.2 --- Modifications to the GLR Parsing Algorithm --- p.37 / Chapter 4.3 --- Parser Composition Algorithms --- p.45 / Chapter 4.3.1 --- Parser Composition by Cascading --- p.46 / Chapter 4 3.2 --- Parser Composition with Predictive Pruning --- p.48 / Chapter 4.3.3 --- Comparison of Parser Composition by Cascading and Parser Composition with Predictive Pruning --- p.54 / Chapter 4.4 --- Chapter Summary --- p.54 / Chapter 5 --- Experimental Results and Analysis --- p.56 / Chapter 5.1 --- Introduction --- p.56 / Chapter 5.2 --- Experimental Corpus --- p.57 / Chapter 5.3 --- ATIS Grammar Development --- p.60 / Chapter 5.4 --- Grammar Partitioning and Parser Composition on ATIS Domain --- p.62 / Chapter 5.4.1 --- ATIS Grammar Partitioning --- p.62 / Chapter 5.4.2 --- Parser Composition on ATIS --- p.63 / Chapter 5.5 --- Ambiguity Handling --- p.66 / Chapter 5.6 --- Semantic Interpretation --- p.69 / Chapter 5.6.1 --- Best Path Selection --- p.69 / Chapter 5.6.2 --- Semantic Frame Generation --- p.71 / Chapter 5.6.3 --- Post-Processing --- p.72 / Chapter 5.7 --- Experiments --- p.73 / Chapter 5.7.1 --- Grammar Coverage --- p.73 / Chapter 5.7.2 --- Size of Parsing Table --- p.74 / Chapter 5.7.3 --- Computational Costs --- p.76 / Chapter 5.7.4 --- Accuracy Measures in Natural Language Understanding --- p.81 / Chapter 5.7.5 --- Summary of Results --- p.90 / Chapter 5.8 --- Chapter Summary --- p.91 / Chapter 6 --- Conclusions --- p.92 / Chapter 6.1 --- Thesis Summary --- p.92 / Chapter 6.2 --- Thesis Contributions --- p.93 / Chapter 6.3 --- Future Work --- p.94 / Chapter 6.3.1 --- Statistical Approach on Grammar Partitioning --- p.94 / Chapter 6.3.2 --- Probabilistic modeling for Best Parse Selection --- p.95 / Chapter 6.3.3 --- Robust Parsing Strategies --- p.96 / Bibliography --- p.97 / Chapter A --- ATIS-3 Grammar --- p.101 / Chapter A.l --- English ATIS-3 Grammar Rules --- p.101 / Chapter A.2 --- Chinese ATIS-3 Grammar Rules --- p.104 Parsing (Computer grammar) Computational linguistics

Search results