31 |
Grammar and Parsing: A Typological Investigation of Relative-Clause ProcessingLin, Chien-Jer Charles January 2006 (has links)
This dissertation investigates the role of grammar and parsing in processing relative clauses across languages. A parsing theory called the Incremental Minimalist Parser (IMP), which parses sentences incrementally from left to right, is sketched based on the Minimalist Program (Chomsky, 2001, 2005). We provide sentence processing evidence which supported a universal parsing theory that is structure-based. According to IMP (and other structure-based theories), a gap located at the subject position is more easily accessed than a gap located at the object position in both head-initial (e.g. English) and head-final (e.g. Mandarin) relative clauses. Experiment 1 (self-paced reading tasks) showed a processing advantage for Mandarin relative clauses that involved subject extractions over object extractions, consistent with the universal subject preference found in all other languages. Experiments 2 to 4 (naturalness ratings, paraphrasing tasks, and self-paced reading tasks) focused on possessor relative clauses. When the possessor gap was located at the subject position (i.e. in passives), a possessive relation was easier to construct than when the gap was located at an object position (i.e. in canonical constructions and sentences involving BA). The results of Experiments 1-4 suggested that processing accounts based on locality and canonicity, but not on syntactic structure, cannot account for the processing preferences of filler-gap relations in relative clauses. Experiment 5 (self-paced reading tasks) investigated whether the surface NVN sequence of relative clauses at sentence-initial positions induced garden path, and whether the animacy of the first noun in such sequences could rescue the garden path. Mandarin relative clauses involving topicalization of the embedded object were investigated. The results suggested that the surface NVN sequence did induce main-clause misanalysis (as Subject-Verb-Object). Even when the first noun was (semantically) an unlikely agent, the parser took it as a subject in the initial syntactic analysis. Semantics did not have an immediate effect on syntactic processing.
|
32 |
Parallel parsing of context-free languages on an array of processorsLanglois, Laurent Chevalier January 1988 (has links)
Kosaraju [Kosaraju 69] and independently ten years later, Guibas, Kung and Thompson [Guibas 79] devised an algorithm (K-GKT) for solving on an array of processors a class of dynamic programming problems of which general context-free language (CFL) recognition is a member. I introduce an extension to K-GKT which allows parsing as well as recognition. The basic idea of the extension is to add counters to the processors. These act as pointers to other processors. The extended algorithm consists of three phases which I call the recognition phase, the marking phase and the parse output phase. I first consider the case of unambiguous grammars. I show that in that case, the algorithm has O(n2log n) space complexity and a linear time complexity. To obtain these results I rely on a counter implementation that allows the execution in constant time of each of the operations: set to zero, test if zero, increment by 1 and decrement by 1. I provide a proof of correctness of this implementation. I introduce the concept of efficient grammars. One factor in the multiplicative constant hidden behind the O(n2log n) space complexity measure for the algorithm is related to the number of non-terminals in the (unambiguous) grammar used. I say that a grammar is k-efficient if it allows the processors to store not more than k pointer pairs. I call a 1-efficient grammar an efficient grammar. I show that two properties that I call nt-disjunction and rhsdasjunction together with unambiguity are sufficient but not necessary conditions for grammar efficiency. I also show that unambiguity itself is not a necessary condition for efficiency. I then consider the case of ambiguous grammars. I present two methods for outputting multiple parses. Both output each parse in linear time. One method has O(n3log n) space complexity while the other has O(n2log n) space complexity. I then address the issue of problem decomposition. I show how part of my extension can be adapted, using a standard technique, to process inputs that would be too large for an array of some fixed size. I then discuss briefly some issues related to implementation. I report on an actual implementation on the I.C.L. DAP. Finally, I show how another systolic CFL parsing algorithm, by Chang, Ibarra and Palis [Chang 87], can be generalized to output parses in preorder and inorder.
|
33 |
Evaluating inherited attributes using Haskell and lazy evaluationMoss, William B. January 2005 (has links)
Thesis (B.A.)--Haverford College, Dept. of Computer Science, 2005. / Includes bibliographical references.
|
34 |
Incremental generative models for syntactic and semantic natural language processingBuys, Jan Moolman January 2017 (has links)
This thesis investigates the role of linguistically-motivated generative models of syntax and semantic structure in natural language processing (NLP). Syntactic well-formedness is crucial in language generation, but most statistical models do not account for the hierarchical structure of sentences. Many applications exhibiting natural language understanding rely on structured semantic representations to enable querying, inference and reasoning. Yet most semantic parsers produce domain-specific or inadequately expressive representations. We propose a series of generative transition-based models for dependency syntax which can be applied as both parsers and language models while being amenable to supervised or unsupervised learning. Two models are based on Markov assumptions commonly made in NLP: The first is a Bayesian model with hierarchical smoothing, the second is parameterised by feed-forward neural networks. The Bayesian model enables careful analysis of the structure of the conditioning contexts required for generative parsers, but the neural network is more accurate. As a language model the syntactic neural model outperforms both the Bayesian model and n-gram neural networks, pointing to the complementary nature of distributed and structured representations for syntactic prediction. We propose approximate inference methods based on particle filtering. The third model is parameterised by recurrent neural networks (RNNs), dropping the Markov assumptions. Exact inference with dynamic programming is made tractable here by simplifying the structure of the conditioning contexts. We then shift the focus to semantics and propose models for parsing sentences to labelled semantic graphs. We introduce a transition-based parser which incrementally predicts graph nodes (predicates) and edges (arguments). This approach is contrasted against predicting top-down graph traversals. RNNs and pointer networks are key components in approaching graph parsing as an incremental prediction problem. The RNN architecture is augmented to condition the model explicitly on the transition system configuration. We develop a robust parser for Minimal Recursion Semantics, a linguistically-expressive framework for compositional semantics which has previously been parsed only with grammar-based approaches. Our parser is much faster than the grammar-based model, while the same approach improves the accuracy of neural Abstract Meaning Representation parsing.
|
35 |
Syntax-mediated semantic parsingReddy Goli, Venkata Sivakumar January 2017 (has links)
Querying a database to retrieve an answer, telling a robot to perform an action, or teaching a computer to play a game are tasks requiring communication with machines in a language interpretable by them. Semantic parsing is the task of converting human language to a machine interpretable language. While human languages are sequential in nature with latent structures, machine interpretable languages are formal with explicit structures. The computational linguistics community have created several treebanks to understand the formal syntactic structures of human languages. In this thesis, we use these to obtain formal meaning representations of languages, and learn computational models to convert these meaning representations to the target machine representation. Our goal is to evaluate if existing treebank syntactic representations are useful for semantic parsing. Existing semantic parsing methods mainly learn domain-specific grammars which can parse human languages to machine representation directly. We deviate from this trend and make use of general-purpose syntactic grammar to help in semantic parsing. We use two syntactic representations: Combinatory Categorial Grammar (CCG) and dependency syntax. CCG has a well established theory on deriving meaning representations from its syntactic derivations. But there are no CCG treebanks for many languages since these are difficult to annotate. In contrast, dependencies are easy to annotate and have many treebanks. However, dependencies do not have a well established theory for deriving meaning representations. In this thesis, we propose novel theories for deriving meaning representations from dependencies. Our evaluation task is question answering on a knowledge base. Given a question, our goal is to answer it on the knowledge base by converting the question to an executable query. We use Freebase, the knowledge source behind Google’s search engine, as our knowledge base. Freebase contains millions of real world facts represented in a graphical format. Inspired from the Freebase structure, we formulate semantic parsing as a graph matching problem, i.e., given a natural language sentence, we convert it into a graph structure from the meaning representation obtained from syntax, and find the subgraph of Freebase that best matches the natural language graph. Our experiments on Free917, WebQuestions and GraphQuestions semantic parsing datasets conclude that general-purpose syntax is more useful for semantic parsing than induced task-specific syntax and syntax-agnostic representations.
|
36 |
Descrição de Formalização de Verbos de Ação-Processo para Elaboração de ParserRODRIGUES, C. A. S. 07 March 2009 (has links)
Made available in DSpace on 2016-08-29T15:08:39Z (GMT). No. of bitstreams: 1
tese_3447_Dissertação Carlos Rodrigues.pdf: 5099942 bytes, checksum: b851f71871ce262bfad43557c2685b8a (MD5)
Previous issue date: 2009-03-07 / Chafe (1970) elaborou um programa de pesquisa que deu origem a seis subcategorias semânticas para a classificação dos verbos, entre elas os verbos de ação-processo. No entanto, a literatura que versa sobre o assunto fornece um referencial teórico-metodológico bastante conciso, tanto com relação às propriedades semânticas da subcategoria em questão, quanto com relação a suas propriedades sintáticas. Com o intuito de ampliar a quantidade de informações sintático-semânticas sobre os verbos de ação-processo, pautou-se a presente pesquisa num programa de pesquisa que visa à identificação das valências verbais, proposto por Borba (1996) e Welker (2005). Assim sendo, foram investigados quatro tipos de valência verbal: a lógica (Tesnière, 1959; Helbig e Schenkel, 1975); a sintática (Borba, 1996; Ignácio, 2001); a semântica e a sintático-semântica (Fillmore, 1968; Travaglia, 1985; Dik, 1989; Dowty, 1989). Ao final dessa etapa de investigação lingüística, foi possível confirmar a heterogeneidade da subcategoria dos verbos de ação-processo, que puderam ser divididos em nove subgrupos. Além disso, pôde-se explicitar tanto as estruturas argumentais pertencentes aos subgrupos, quanto os elementos que representam os actantes que compõem tais configurações sintáticas. Ao final, o conhecimento lingüístico obtido nesta pesquisa possibilitou a construção de três recursos lingüísticos que fornecem base para a construção de recursos computacionais para processamento de linguagem natural: (i) uma tábua de léxico-gramática, contendo as propriedades morfossintático-semânticas dos verbos e de seus actantes; (ii) uma base de dados lexicais, com as propriedades morfossintático-semânticas dos verbos analisados; e (ii) as estruturas argumentais identificadas em cada subgrupo.
|
37 |
Inkrementální načítání dokumentů v zobrazovacím stroji HTML / Incremental Document Parsing in the HTML Rendering EngineHrabec, Pavel January 2016 (has links)
The goal of this thesis is to explore the CSSBox experimental rendering engine, to explore the possibility of its expansion on incremental rendering of documents and then to propose the necessary modifications. The opening chapters contain an overview of existing possibilities and subsequently, the solution is proposed. The proposed changes are implemented and tested. Experiments were performed and results evaluated. The conclusion is dedicated to the evaluation of results and options for further development are outlined.
|
38 |
Facilitating communication via the Orc protocol : Facilitating communication via the Orc protocolEriksson, Tobias January 2007 (has links)
This master thesis project took place at Orc Software. This company provides technology for advanced trading, market making, and brokerage. The Orc System is based on a client/server architecture. The ordinary way to communicate with the Orc Server System is via the Orc Client Applications, such as Orc Trader or Orc Broker. Additionally, there is another way to communicate with the Orc Server System without using an Orc Client Application. There is a service within the Orc Server System which provides an interface for communication with the Orc Server System. Clients can communicate via this interface using the Orc Protocol (OP). Banks and brokers usually have different systems that are specialized for different needs. Often there is a need to integrate these systems with the Orc Server. In order to simplify the integration for customers with modest programming experience in TCP/IP and parsing techniques, Orc Software would like to provide an example parser/generator capable of communication with the Orc Server System free of charge. This thesis introduces a toolkit consisting of a parser/generator and a sample application. The application provides several examples as well as serves as verification to the customers of how simple it is to develop their own applications by utilizing the different OP messages. A comparison was made between the newly created OP parser/generator and a manually generated FIX client using the FIX gateway which ORC Software AB also sells. This evaluation shows that OP parser/generator is both faster and less memory demanding than the manually generated FIX client. / Det här examensarbetet är utfört på Orc Software, som utvecklar system för avancerad handel, market making samt mäkleri. Detta system är baserat på en klient/server arkitektur. Normalt sker kommunikationen med Orc Servern via Orc klient applikationer som Orc Trader eller Orc Broker. Men det finns även ytterligare ett sätt att kommunicera med Orc Servern utan att använda Orc klient applikationer. Det finns en tjänst i Orc Servern som tillhandahåller ett gränssnitt som går att kommunicera med genom att använda Orc Protocol (OP) meddelanden. Banker och mäklare har vanligtvis flera olika system som alla är specialiserade för olika behov. Detta gör att det ofta finns ett behov att integrera dessa system med Orc Servern. För att kunna underlätta integrationen för kunder med låga kunskaper i TCP/IP och parsing teknik, vill Orc Software tillhandahålla en gratis parser/genererare som kan kommunicera med Orc Server Systemet. Examensarbetet introducerar ett paket innehållande en parser/genererare och ett exempelprogram. Programmet visar ett par exempel samt fungerar som bekräftelse på hur enkelt det kan vara att utveckla ett eget program som använder sig av del olika OP meddelanden. Avslutningsvis presenteras en utvärderingsstudie mellan den utvecklade parser/generator och en manuellt genererad FIX klient som använder en FIX gateway som Orc Software också säljer. Utvärderingen visar att parser/genereraren är både snabbare och använder mindre minne än FIX klienten.
|
39 |
Using Dependency Parses to Augment Feature Construction for Text MiningGuo, Sheng 18 June 2012 (has links)
With the prevalence of large data stored in the cloud, including unstructured information in the form of text, there is now an increased emphasis on text mining. A broad range of techniques are now used for text mining, including algorithms adapted from machine learning, NLP, computational linguistics, and data mining. Applications are also multi-fold, including classification, clustering, segmentation, relationship discovery, and practically any task that discovers latent information from written natural language.
Classical mining algorithms have traditionally focused on shallow representations such as bag-of-words and similar feature-based models. With the advent of modern high performance computing, deep sentence level linguistic analysis of large scale text corpora has become practical. In this dissertation, we evaluate the utility of dependency parses as textual features for different text mining applications. Dependency parsing is one form of syntactic parsing, based on the dependency grammar implicit in sentences. While dependency parsing has traditionally been used for text understanding, we investigate here its application to supply features for text mining applications.
We specifically focus on three methods to construct textual features from dependency parses. First, we consider a dependency parse as a general feature akin to a traditional bag-of-words model. Second, we consider the dependency parse as the basis to build a feature graph representation. Finally, we use dependency parses in a supervised collocation mining method for feature selection. To investigate these three methods, several applications are studied, including: (i) movie spoiler detection, (ii) text segmentation, (iii) query expansion, and (iv) recommender systems. / Ph. D.
|
40 |
Framework for Automatic Translation of Hardware Specifications Written in English to a Formal LanguageKrishnamurthy, Rahul 01 November 2022 (has links)
The most time-consuming component of designing and launching hardware products to market is the verification of Integrated Circuits (IC). An effective way of verifying a design can be achieved by adding assertions to the design. Automatic translation of hardware specifications from natural language to assertions in a formal representation has the potential to improve the verification productivity of ICs. However, natural language specifications have the characteristics of being imprecise, incomplete, and ambiguous. An automation framework can benefit verification engineers only if it is designed with the right balance between the ease of expression and precision of meaning allowed for in the input natural language specifications. This requirement introduces two major challenges for designing an effective translation framework. The first challenge is to allow the processing of expressive specifications with flexible word order variations and sentence structures. The second challenge is to assist users in writing unambiguous and complete specifications in the English language that can be accurately translated.
In this dissertation, we address the first challenge by modeling semantic parsing of the input sentence as a game of BINGO that can capture the combinatorial nature of natural language semantics. BINGO parsing considers the context of each word in the input sentence to ensure high precision in the creation of semantic frames.
We address the second challenge by designing a suggestion and feedback framework to assist users in writing clear and coherent specifications. Our feedback generates different ways of writing acceptable sentences when the input sentence is not understood.
We evaluated our BINGO model on 316 hardware design specifications taken from the documents of AMBA, memory controller, and UART architectures. The results showed that highly expressive specifications could be handled in our BINGO model. It also demonstrated the ease of creating rules to generate the same semantic frame for specifications with the same meaning but different word order.
We evaluated the suggestion and rewriting framework on 132 erroneous specifications taken from AMBA and memory controller architectures documents. Our system generated suggestions for all the specs. On manual inspection, we found that 87% of these suggestions were semantically closer to the intent of the input specification. Moreover, automatic contextual analysis of the rewritten form of the input specification allowed the translation of the input specification with different words and different order of words that were not defined in our grammar. / Doctor of Philosophy / The most time-consuming component of designing and launching hardware products to market is the verification of hardware circuits. An effective way of verifying a design is to add programming codes called assertions in the design. The creation of assertions can be time-consuming and error-prone due to the technical details needed to write assertions. Automatically translating assertion specifications written in English to program code can reduce design time and errors since the English language hides away the technical details required for writing assertions. However, sentences written in English language can have multiple and incomplete interpretations. It becomes difficult for machines to understand assertions written in the English language.
In this work, we automatically generate assertions from assertion descriptions written in English. We propose techniques to write rules that can accurately translate English specifications to assertions. Our rules allow a user to write specifications with flexible use of word order and word interpretations. We have tested the understanding framework on English specifications taken from four different types of hardware design architectures.
Since we cannot create rules to understand all possible ways of writing a specification, we have proposed a suggestion framework that can inform the user about the words and word structures acceptable to our translation framework. The suggestion framework was tested on specifications of AMBA and memory controller architectures.
|
Page generated in 0.0453 seconds