Spelling suggestions: "subject:"parsing (computer grammar)"" "subject:"parsing (coomputer grammar)""
31 |
Construction methods of LR parsers /Schimpf, Karl M. January 1900 (has links)
Thesis (M.S.)--University of Pennsylvania, 1981. / "May 1981." Includes bibliographical references.
|
32 |
Design of a generic parse tree for imperative languagesMansfield, Martin F. January 1992 (has links)
Since programs are written in many languages and design documents are not maintained (if they ever existed), there is a need to extract the design and other information that the programs represent. To do this without writing a separate program for each language, a common representation of the symbol table and parse tree would be required.The purpose of the parse tree and symbol table will not be to generate object code but to provide a platform for analysis tools. In this way the tool designer develops only one version instead of separate versions for each language. The generic symbol table and generic parse tree may not be as detailed as those same structures in a specific compiler but the parse tree must include all structures for imperative languages. / Department of Computer Science
|
33 |
Lexical approaches to backoff in statistical parsingLakeland, Corrin, n/a January 2006 (has links)
This thesis develops a new method for predicting probabilities in a statistical parser so that more sophisticated probabilistic grammars can be used. A statistical parser uses a probabilistic grammar derived from a training corpus of hand-parsed sentences. The grammar is represented as a set of constructions - in a simple case these might be context-free rules. The probability of each construction in the grammar is then estimated by counting its relative frequency in the corpus.
A crucial problem when building a probabilistic grammar is to select an appropriate level of granularity for describing the constructions being learned. The more constructions we include in our grammar, the more sophisticated a model of the language we produce. However, if too many different constructions are included, then our corpus is unlikely to contain reliable information about the relative frequency of many constructions.
In existing statistical parsers two main approaches have been taken to choosing an appropriate granularity. In a non-lexicalised parser constructions are specified as structures involving particular parts-of-speech, thereby abstracting over individual words. Thus, in the training corpus two syntactic structures involving the same parts-of-speech but different words would be treated as two instances of the same event. In a lexicalised grammar the assumption is that the individual words in a sentence carry information about its syntactic analysis over and above what is carried by its part-of-speech tags. Lexicalised grammars have the potential to provide extremely detailed syntactic analyses; however, Zipf�s law makes it hard for such grammars to be learned.
In this thesis, we propose a method for optimising the trade-off between informative and learnable constructions in statistical parsing. We implement a grammar which works at a level of granularity in between single words and parts-of-speech, by grouping words together using unsupervised clustering based on bigram statistics. We begin by implementing a statistical parser to serve as the basis for our experiments. The parser, based on that of Michael Collins (1999), contains a number of new features of general interest. We then implement a model of word clustering, which we believe is the first to deliver vector-based word representations for an arbitrarily large lexicon. Finally, we describe a series of experiments in which the statistical parser is trained using categories based on these word representations.
|
34 |
ETRANS : an English-Thai translator /Warote, Nuntaporn. January 1991 (has links)
Thesis (M.S.)--Rochester Institute of Technology, 1991. / Typescript. Includes bibliography: leaf [53].
|
35 |
A maximum entropy approach to Chinese language parsing /Yang, Yongsheng. January 2002 (has links)
Thesis (M. Phil.)--Hong Kong University of Science and Technology, 2002. / Includes bibliographical references (leaves 54-55). Also available in electronic version. Access restricted to campus users.
|
36 |
Semantic lexicon acquisition for learning natural language interfaces /Thompson, Cynthia Ann, January 1998 (has links)
Thesis (Ph. D.)--University of Texas at Austin, 1998. / Vita. Includes bibliographical references (leaves 134-145). Available also in a digital version from Dissertation Abstracts.
|
37 |
Practical Earley parsing and the SPARK toolkitAycock, John Daniel 23 May 2018 (has links)
Domain-specific, “little” languages are commonplace in computing. So too is the need to implement such languages; to meet this need, we have created SPARK (Scanning, Parsing, And Rewriting Kit), a toolkit for little language implementation in Python, an object-oriented scripting language.
SPARK greatly simplifies the task of little language implementation. It requires little code to be written, and accommodates a wide range of users—even those without a background in compiler theory. Our toolkit is seeing increasing use on a variety of diverse projects.
SPARK was designed to be easy-to-use with few limitations, and relies heavily on Earley's general parsing algorithm internally, which helps in meeting these design goals. Earley's algorithm, in its standard form, can be hard to use; indeed, experience with SPARK has highlighted several problems with the practical use of Earley's algorithm. Our research addresses and provides solutions for these problems, making some significant improvements to the implementation and use of Earley's algorithm.
First, Earley's algorithm suffers from the performance problem . Even under optimum conditions, a standard Earley parser is burdened with overhead. We extend directly-executable parsing techniques for use in Earley parsers, the results of which run in time comparable to the much-more-specialized LALR(1) parsing algorithm.
Second is what we call the delayed action problem. General parsers like Earley must, in the worst case, read the entire input before executing any semantic actions associated with the grammar rules. We attack this problem in two ways. We have identified conditions under which it is safe to execute semantic actions on the fly during recognition; as a side effect, this has yielded space savings of over 90% for some grammars. The other approach to the delayed action problem deals with the difficulty of handling context-dependent tokens. Such tokens are easy to handle using what we call “Schrödinger's tokens,” a superposition of token types.
Finally, Earley parsers are complicated by the need to process grammar rules with empty right-hand sides. We present a simple, efficient way to handle these empty rules, and prove that our new method is correct. We also show how our method may be used to create a new type of LR(0) automaton which is ideally suited for use in Earley parsers.
Our work has made Earley parsing faster and more space-efficient, turning it into an excellent candidate for practical use in many applications. / Graduate
|
38 |
A parser generator system to handle complete syntaxOssher, Harold Leon January 1982 (has links)
To define a language completely, it is necessary to define both its syntax and semantics. If these definitions are in a suitable form, the parser and code-generator of a compiler, respectively, can be generated from them. This thesis addresses the problem of syntax definition and automatic parser generation.
|
39 |
A model of grammar based on principles of government and bindingSharp, Randall Martin January 1985 (has links)
This thesis describes an implementation of a model of natural language grammar based on current theories of transformational grammar, collectively referred to as Government and Binding (GB) theory. A description is presented of the principles of GB, including X-bar syntax and the theories of Case, Theta, Binding, Bounding, and Government The principles, in effect, constitute an embodiment of "universal grammar" (UG), i.e. the abstract characterization of the innately endowed human language faculty. Associated with the principles is a set of parameters that alter the effect of the principles. The "core grammar" of a specific language is an instantiation of UG with the parameters set in a particular way. To demonstrate the cross-linguistic nature of the theory, a subset of the "core grammars" of Spanish and English is implemented, including their parametric values and certain language-specific transformations required to characterize grammatical sentences. Sentences in one language are read in and converted through a series of reverse transformations to a base representation in the target language. To this representation, transformations are applied that produce a set of output sentences. The well-formedness of these sentences is verified by the general principles of UG as controlled by the parameters. Any that fail to meet the conditions are rejected so that only grammatical sentences are displayed. The model is written in the Prolog programming language. / Science, Faculty of / Computer Science, Department of / Graduate
|
40 |
Larger-first partial parsingVan Delden, Sebastian Alexander 01 January 2003 (has links) (PDF)
Larger-first partial parsing is a primarily top-down approach to partial parsing that is opposite to current easy-first, or primarily bottom-up, strategies. A rich partial tree structure is captured by an algorithm that assigns a hierarchy of structural tags to each of the input tokens in a sentence. Part-of-speech tags are first assigned to the words in a sentence by a part-of-speech tagger. A cascade of Deterministic Finite State Automata then uses this part-of-speech information to identify syntactic relations primarily in a descending order of their size. The cascade is divided into four specialized sections: (1) a Comma Network, which identifies syntactic relations associated with commas; (2) a Conjunction Network, which partially disambiguates phrasal conjunctions and llly disambiguates clausal conjunctions; (3) a Clause Network, which identifies non-comma-delimited clauses; and (4) a Phrase Network, which identifies the remaining base phrases in the sentence. Each automaton is capable of adding one or more levels of structural tags to the tokens in a sentence. The larger-first approach is compared against a well-known easy-first approach. The results indicate that this larger-first approach is capable of (1) producing a more detailed partial parse than an easy first approach; (2) providing better containment of attachment ambiguity; (3) handling overlapping syntactic relations; and (4) achieving a higher accuracy than the easy-first approach. The automata of each network were developed by an empirical analysis of several sources and are presented here in detail.
|
Page generated in 0.0843 seconds