Global ETD Search

1	Lexically specified derivational control in combinatory categorial grammar Baldridge, Jason January 2002 (has links) This dissertation elaborates several refinements to the Combinatory Categorial Grammar (CCG) framework which are motivated by phenomena in parametrically diverse languages such as English, Dutch, Tagalog, Toba Batak and Turkish. I present Multi-Modal Combinatory Categorial Grammar, a formulation of CCG which incorporates devices and category constructors from related categorical frameworks and demonstrate the effectiveness of these modifications both for providing parsimonious linguistic analyses and for improving the representation of the lexicon and computational processing. Altogether, this dissertation provides many formal, linguistic, and computational justifications for the central thesis that this dissertation puts forth- that an explanatory theory of natural language grammar can be based on a categorial grammar formalism which allows cross-linguistic variation only in the lexicon and has computationally attractive properties. 410.285 Combinatory Categorial Grammar
2	Category neutrality a type-logical investigation / Whitman, Philip Neal. January 2002 (has links) Thesis (Ph. D.)--Ohio State University, 2002. / Title from first page of PDF file. Document formatted into pages; contains xii, 320 p., also contains graphics. Includes abstract and vita. Advisor: David R. Dowty, Dept. of Linguistics. Includes bibliographical references (p. 315-320).
3	Category neutrality : a type-logical investigation / Whitman, Philip Neal. January 2002 (has links) No description available. Language Lexicology Semantics Categorial grammar
4	An inheritance-based theory of the lexicon in combinatory categorial grammar McConville, Mark January 2008 (has links) This thesis proposes an extended version of the Combinatory Categorial Grammar (CCG) formalism, with the following features: 1. grammars incorporate inheritance hierarchies of lexical types, defined over a simple, feature-based constraint language 2. CCG lexicons are, or at least can be, functions from forms to these lexical types This formalism, which I refer to as ‘inheritance-driven’ CCG (I-CCG), is conceptualised as a partially model-theoretic system, involving a distinction between category descriptions and their underlying category models, with these two notions being related by logical satisfaction. I argue that the I-CCG formalism retains all the advantages of both the core CCG framework and proposed generalisations involving such things as multiset categories, unary modalities or typed feature structures. In addition, I-CCG: 1. provides non-redundant lexicons for human languages 2. captures a range of well-known implicational word order universals in terms of an acquisition-based preference for shorter grammars This thesis proceeds as follows: Chapter 2 introduces the ‘baseline’ CCG formalism, which incorporates just the essential elements of category notation, without any of the proposed extensions. Chapter 3 reviews parts of the CCG literature dealing with linguistic competence in its most general sense, showing how the formalism predicts a number of language universals in terms of either its restricted generative capacity or the prioritisation of simpler lexicons. Chapter 4 analyses the first motivation for generalising the baseline category notation, demonstrating how certain fairly simple implicational word order universals are not formally predicted by baseline CCG, although they intuitively do involve considerations of grammatical economy. Chapter 5 examines the second motivation underlying many of the customised CCG category notations — to reduce lexical redundancy, thus allowing for the construction of lexicons which assign (each sense of) open class words and morphemes to no more than one lexical category, itself denoted by a non-composite lexical type. Chapter 6 defines the I-CCG formalism, incorporating into the notion of a CCG grammar both a type hierarchy of saturated category symbols and an inheritance hierarchy of constrained lexical types. The constraint language is a simple, feature-based, highly underspecified notation, interpreted against an underlying notion of category models — this latter point is crucial, since it allows us to abstract away from any particular inference procedure and focus on the category notation itself. I argue that the partially model-theoretic I-CCG formalism solves the lexical redundancy problem fairly definitively, thereby subsuming all the other proposed variant category notations. Chapter 7 demonstrates that the I-CCG formalism also provides the beginnings of a theory of the CCG lexicon in a stronger sense — with just a small number of substantive assumptions about types, it can be shown to formally predict many implicational word order universals in terms of an acquisition-based preference for simpler lexical inheritance hierarchies, i.e. those with fewer types and fewer constraints. Chapter 8 concludes the thesis. 006.3
5	A Type System For Combinatory Categorial Grammar Erkan, Gunes 01 January 2003 (has links) (PDF) This thesis investigates the internal structure and the computational representation of the lexical entries in Combinatory Categorial Grammar (CCG). A restricted form of typed feature structures is proposed for representing CCG categories. This proposal is combined with a constraint-based modality system for basic categories of CCG. We present some linguistic evidence to explain why both a unication-based feature system and a constraint-based modality system are needed for a lexicalist framework. An implementation of our system is also presented.
6	Factorial Hidden Markov Models for full and weakly supervised supertagging Ramanujam, Srivatsan 2009 August 1900 (has links) For many sequence prediction tasks in Natural Language Processing, modeling dependencies between individual predictions can be used to improve prediction accuracy of the sequence as a whole. Supertagging, involves assigning lexical entries to words based on lexicalized grammatical theory such as Combinatory Categorial Grammar (CCG). Previous work has used Bayesian HMMs to learn taggers for both POS tagging and supertagging separately. Modeling them jointly has the potential to produce more robust and accurate supertaggers trained with less supervision and thereby potentially help in the creation of useful models for new languages and domains. Factorial Hidden Markov Models (FHMM) support joint inference for multiple sequence prediction tasks. Here, I use them to jointly predict part-of-speech tag and supertag sequences with varying levels of supervision. I show that supervised training of FHMM models improves performance compared to standard HMMs, especially when labeled training material is scarce. Secondly, FHMMs trained from tag dictionaries rather than labeled examples also perform better than a standard HMM. Finally, I show that an FHMM and a maximum entropy Markov model can complement each other in a single step co-training setup that improves the performance of both models when there is limited labeled training material available. / text Hidden Markov Models Bayesian Models Categorial Grammar Supertagging Joint Inference
7	Harmonic analysis of music using combinatory categorial grammar Granroth-Wilding, Mark Thomas January 2013 (has links) Various patterns of the organization of Western tonal music exhibit hierarchical structure, among them the harmonic progressions underlying melodies and the metre underlying rhythmic patterns. Recognizing these structures is an important part of unconscious human cognitive processing of music. Since the prosody and syntax of natural languages are commonly analysed with similar hierarchical structures, it is reasonable to expect that the techniques used to identify these structures automatically in natural language might also be applied to the automatic interpretation of music. In natural language processing (NLP), analysing the syntactic structure of a sentence is prerequisite to semantic interpretation. The analysis is made difficult by the high degree of ambiguity in even moderately long sentences. In music, a similar sort of structural analysis, with a similar degree of ambiguity, is fundamental to tasks such as key identification and score transcription. These and other tasks depend on harmonic and rhythmic analyses. There is a long history of applying linguistic analysis techniques to musical analysis. In recent years, statistical modelling, in particular in the form of probabilistic models, has become ubiquitous in NLP for large-scale practical analysis of language. The focus of the present work is the application of statistical parsing to automatic harmonic analysis of music. This thesis demonstrates that statistical parsing techniques, adapted from NLP with little modification, can be successfully applied to recovering the harmonic structure underlying music. It shows first how a type of formal grammar based on one used for linguistic syntactic processing, Combinatory Categorial Grammar (CCG), can be used to analyse the hierarchical structure of chord sequences. I introduce a formal language similar to first-order predicate logical to express the hierarchical tonal harmonic relationships between chords. The syntactic grammar formalism then serves as a mechanism to map an unstructured chord sequence onto its structured analysis. In NLP, the high degree of ambiguity of the analysis means that a parser must consider a huge number of possible structures. Chart parsing provides an efficient mechanism to explore them. Statistical models allow the parser to use information about structures seen before in a training corpus to eliminate improbable interpretations early on in the process and to rank the final analyses by plausibility. To apply the same techniques to harmonic analysis of chord sequences, a corpus of tonal jazz chord sequences annotated by hand with harmonic analyses is constructed. Two statistical parsing techniques are adapted to the present task and evaluated on their success at recovering the annotated structures. The experiments show that parsing using a statistical model of syntactic derivations is more successful than a Markovian baseline model at recovering harmonic structure. In addition, the practical technique of statistical supertagging serves to speed up parsing without any loss in accuracy. This approach to recovering harmonic structure can be extended to the analysis of performance data symbolically represented as notes. Experiments using some simple proof-of-concept extensions of the above parsing models demonstrate one probabilistic approach to this. The results reported provide a baseline for future work on the task of harmonic analysis of performances.
8	Information structure in discourse Traat, Maarika January 2006 (has links) The present dissertation proposes integrating Discourse Representation Theory (DRT), information structure (IS) and Combinatory Categorial Grammar (CCG) into a single framework. It achieves this by making two new contributions to computational treatment of information structure. First, it presents an uncomplicated approach to incorporating information structure in DRT. Second, it shows how the new DRT representation can be integrated into a unification-based grammar framework in a straightforward manner. We foresee the main application of the new formalism to be in spoken language systems: the approach presented here has the potential to considerably facilitate spoken language systems benefiting from insights derived from information structure. The DRT representation with information structure which is proposed in this dissertation is simpler than the previous attempts to include information structure in DRT. We believe that the simplicity of the Information-Structure-marked Discourse Representation Structure (IS-DRS) is precisely what makes it attractive and easy to use for practical tasks like determining the intonation in spoken language applications. The IS component in ISDRS covers a range of aspects of information structural semantics. A further advantage of IS-DRS is that in its case a single semantic representation is suitable for both the generation of context-appropriate prosody and automatic reasoning. A semantic representation on its own is useful for describing and analysing a language. However, it is of even greater utility if it is accompanied by a mechanism that allows one to directly infer the semantic representation from a natural language expression. We incorporated the IS-DRS into the Categorial Grammar (CG) framework, developing a unification based realisation of Combinatory Categorial Grammar, which we call Unification-based Combinatory Categorial Grammar (UCCG). UCCG inherits elements from Combinatory Categorial Grammar and Unification Categorial Grammar. The UCCG framework is developed gradually throughout the dissertation. The information structural component is included as the final step. The IS-DRSs for linguistic expressions are built up compositionally from the IS-DRSs of their sub-expressions. Feature unification is the driving force in this process. The formalism is illustrated by numerous examples which are characterised by different levels of syntactic complexity and diverse information structure. We believe that the main assets of both the IS-DRSs as well as the Unification-based Combinatory Categorial Grammar framework are their simplicity, transparency, and inherent suitability for computational implementation. This makes them an appealing choice for use in practical applications like spoken language systems. 006.3
9	Integrated supertagging and parsing Auli, Michael January 2012 (has links) Parsing is the task of assigning syntactic or semantic structure to a natural language sentence. This thesis focuses on syntactic parsing with Combinatory Categorial Grammar (CCG; Steedman 2000). CCG allows incremental processing, which is essential for speech recognition and some machine translation models, and it can build semantic structure in tandem with syntactic parsing. Supertagging solves a subset of the parsing task by assigning lexical types to words in a sentence using a sequence model. It has emerged as a way to improve the efficiency of full CCG parsing (Clark and Curran, 2007) by reducing the parser’s search space. This has been very successful and it is the central theme of this thesis. We begin by an analysis of how efficiency is being traded for accuracy in supertagging. Pruning the search space by supertagging is inherently approximate and to contrast this we include A* in our analysis, a classic exact search technique. Interestingly, we find that combining the two methods improves efficiency but we also demonstrate that excessive pruning by a supertagger significantly lowers the upper bound on accuracy of a CCG parser. Inspired by this analysis, we design a single integrated model with both supertagging and parsing features, rather than separating them into distinct models chained together in a pipeline. To overcome the resulting complexity, we experiment with both loopy belief propagation and dual decomposition approaches to inference, the first empirical comparison of these algorithms that we are aware of on a structured natural language processing problem. Finally, we address training the integrated model. We adopt the idea of optimising directly for a task-specific metric such as is common in other areas like statistical machine translation. We demonstrate how a novel dynamic programming algorithm enables us to optimise for F-measure, our task-specific evaluation metric, and experiment with approximations, which prove to be excellent substitutions. Each of the presented methods improves over the state-of-the-art in CCG parsing. Moreover, the improvements are additive, achieving a labelled/unlabelled dependency F-measure on CCGbank of 89.3%/94.0% with gold part-of-speech tags, and 87.2%/92.8% with automatic part-of-speech tags, the best reported results for this task to date. Our techniques are general and we expect them to apply to other parsing problems, including lexicalised tree adjoining grammar and context-free grammar parsing. 415.0285
10	Structure building operations and word order Flynn, Michael J., January 1985 (has links) Thesis (Ph. D.)--University of Massachusetts, 1981. / Includes bibliographical references (p. 129-134).

Search results