Global ETD Search

1	Lexically specified derivational control in combinatory categorial grammar Baldridge, Jason January 2002 (has links) This dissertation elaborates several refinements to the Combinatory Categorial Grammar (CCG) framework which are motivated by phenomena in parametrically diverse languages such as English, Dutch, Tagalog, Toba Batak and Turkish. I present Multi-Modal Combinatory Categorial Grammar, a formulation of CCG which incorporates devices and category constructors from related categorical frameworks and demonstrate the effectiveness of these modifications both for providing parsimonious linguistic analyses and for improving the representation of the lexicon and computational processing. Altogether, this dissertation provides many formal, linguistic, and computational justifications for the central thesis that this dissertation puts forth- that an explanatory theory of natural language grammar can be based on a categorial grammar formalism which allows cross-linguistic variation only in the lexicon and has computationally attractive properties. 410.285 Combinatory Categorial Grammar
2	An inheritance-based theory of the lexicon in combinatory categorial grammar McConville, Mark January 2008 (has links) This thesis proposes an extended version of the Combinatory Categorial Grammar (CCG) formalism, with the following features: 1. grammars incorporate inheritance hierarchies of lexical types, defined over a simple, feature-based constraint language 2. CCG lexicons are, or at least can be, functions from forms to these lexical types This formalism, which I refer to as ‘inheritance-driven’ CCG (I-CCG), is conceptualised as a partially model-theoretic system, involving a distinction between category descriptions and their underlying category models, with these two notions being related by logical satisfaction. I argue that the I-CCG formalism retains all the advantages of both the core CCG framework and proposed generalisations involving such things as multiset categories, unary modalities or typed feature structures. In addition, I-CCG: 1. provides non-redundant lexicons for human languages 2. captures a range of well-known implicational word order universals in terms of an acquisition-based preference for shorter grammars This thesis proceeds as follows: Chapter 2 introduces the ‘baseline’ CCG formalism, which incorporates just the essential elements of category notation, without any of the proposed extensions. Chapter 3 reviews parts of the CCG literature dealing with linguistic competence in its most general sense, showing how the formalism predicts a number of language universals in terms of either its restricted generative capacity or the prioritisation of simpler lexicons. Chapter 4 analyses the first motivation for generalising the baseline category notation, demonstrating how certain fairly simple implicational word order universals are not formally predicted by baseline CCG, although they intuitively do involve considerations of grammatical economy. Chapter 5 examines the second motivation underlying many of the customised CCG category notations — to reduce lexical redundancy, thus allowing for the construction of lexicons which assign (each sense of) open class words and morphemes to no more than one lexical category, itself denoted by a non-composite lexical type. Chapter 6 defines the I-CCG formalism, incorporating into the notion of a CCG grammar both a type hierarchy of saturated category symbols and an inheritance hierarchy of constrained lexical types. The constraint language is a simple, feature-based, highly underspecified notation, interpreted against an underlying notion of category models — this latter point is crucial, since it allows us to abstract away from any particular inference procedure and focus on the category notation itself. I argue that the partially model-theoretic I-CCG formalism solves the lexical redundancy problem fairly definitively, thereby subsuming all the other proposed variant category notations. Chapter 7 demonstrates that the I-CCG formalism also provides the beginnings of a theory of the CCG lexicon in a stronger sense — with just a small number of substantive assumptions about types, it can be shown to formally predict many implicational word order universals in terms of an acquisition-based preference for simpler lexical inheritance hierarchies, i.e. those with fewer types and fewer constraints. Chapter 8 concludes the thesis. 006.3
3	A Type System For Combinatory Categorial Grammar Erkan, Gunes 01 January 2003 (has links) (PDF) This thesis investigates the internal structure and the computational representation of the lexical entries in Combinatory Categorial Grammar (CCG). A restricted form of typed feature structures is proposed for representing CCG categories. This proposal is combined with a constraint-based modality system for basic categories of CCG. We present some linguistic evidence to explain why both a unication-based feature system and a constraint-based modality system are needed for a lexicalist framework. An implementation of our system is also presented.
4	Harmonic analysis of music using combinatory categorial grammar Granroth-Wilding, Mark Thomas January 2013 (has links) Various patterns of the organization of Western tonal music exhibit hierarchical structure, among them the harmonic progressions underlying melodies and the metre underlying rhythmic patterns. Recognizing these structures is an important part of unconscious human cognitive processing of music. Since the prosody and syntax of natural languages are commonly analysed with similar hierarchical structures, it is reasonable to expect that the techniques used to identify these structures automatically in natural language might also be applied to the automatic interpretation of music. In natural language processing (NLP), analysing the syntactic structure of a sentence is prerequisite to semantic interpretation. The analysis is made difficult by the high degree of ambiguity in even moderately long sentences. In music, a similar sort of structural analysis, with a similar degree of ambiguity, is fundamental to tasks such as key identification and score transcription. These and other tasks depend on harmonic and rhythmic analyses. There is a long history of applying linguistic analysis techniques to musical analysis. In recent years, statistical modelling, in particular in the form of probabilistic models, has become ubiquitous in NLP for large-scale practical analysis of language. The focus of the present work is the application of statistical parsing to automatic harmonic analysis of music. This thesis demonstrates that statistical parsing techniques, adapted from NLP with little modification, can be successfully applied to recovering the harmonic structure underlying music. It shows first how a type of formal grammar based on one used for linguistic syntactic processing, Combinatory Categorial Grammar (CCG), can be used to analyse the hierarchical structure of chord sequences. I introduce a formal language similar to first-order predicate logical to express the hierarchical tonal harmonic relationships between chords. The syntactic grammar formalism then serves as a mechanism to map an unstructured chord sequence onto its structured analysis. In NLP, the high degree of ambiguity of the analysis means that a parser must consider a huge number of possible structures. Chart parsing provides an efficient mechanism to explore them. Statistical models allow the parser to use information about structures seen before in a training corpus to eliminate improbable interpretations early on in the process and to rank the final analyses by plausibility. To apply the same techniques to harmonic analysis of chord sequences, a corpus of tonal jazz chord sequences annotated by hand with harmonic analyses is constructed. Two statistical parsing techniques are adapted to the present task and evaluated on their success at recovering the annotated structures. The experiments show that parsing using a statistical model of syntactic derivations is more successful than a Markovian baseline model at recovering harmonic structure. In addition, the practical technique of statistical supertagging serves to speed up parsing without any loss in accuracy. This approach to recovering harmonic structure can be extended to the analysis of performance data symbolically represented as notes. Experiments using some simple proof-of-concept extensions of the above parsing models demonstrate one probabilistic approach to this. The results reported provide a baseline for future work on the task of harmonic analysis of performances.
5	Information structure in discourse Traat, Maarika January 2006 (has links) The present dissertation proposes integrating Discourse Representation Theory (DRT), information structure (IS) and Combinatory Categorial Grammar (CCG) into a single framework. It achieves this by making two new contributions to computational treatment of information structure. First, it presents an uncomplicated approach to incorporating information structure in DRT. Second, it shows how the new DRT representation can be integrated into a unification-based grammar framework in a straightforward manner. We foresee the main application of the new formalism to be in spoken language systems: the approach presented here has the potential to considerably facilitate spoken language systems benefiting from insights derived from information structure. The DRT representation with information structure which is proposed in this dissertation is simpler than the previous attempts to include information structure in DRT. We believe that the simplicity of the Information-Structure-marked Discourse Representation Structure (IS-DRS) is precisely what makes it attractive and easy to use for practical tasks like determining the intonation in spoken language applications. The IS component in ISDRS covers a range of aspects of information structural semantics. A further advantage of IS-DRS is that in its case a single semantic representation is suitable for both the generation of context-appropriate prosody and automatic reasoning. A semantic representation on its own is useful for describing and analysing a language. However, it is of even greater utility if it is accompanied by a mechanism that allows one to directly infer the semantic representation from a natural language expression. We incorporated the IS-DRS into the Categorial Grammar (CG) framework, developing a unification based realisation of Combinatory Categorial Grammar, which we call Unification-based Combinatory Categorial Grammar (UCCG). UCCG inherits elements from Combinatory Categorial Grammar and Unification Categorial Grammar. The UCCG framework is developed gradually throughout the dissertation. The information structural component is included as the final step. The IS-DRSs for linguistic expressions are built up compositionally from the IS-DRSs of their sub-expressions. Feature unification is the driving force in this process. The formalism is illustrated by numerous examples which are characterised by different levels of syntactic complexity and diverse information structure. We believe that the main assets of both the IS-DRSs as well as the Unification-based Combinatory Categorial Grammar framework are their simplicity, transparency, and inherent suitability for computational implementation. This makes them an appealing choice for use in practical applications like spoken language systems. 006.3
6	Integrated supertagging and parsing Auli, Michael January 2012 (has links) Parsing is the task of assigning syntactic or semantic structure to a natural language sentence. This thesis focuses on syntactic parsing with Combinatory Categorial Grammar (CCG; Steedman 2000). CCG allows incremental processing, which is essential for speech recognition and some machine translation models, and it can build semantic structure in tandem with syntactic parsing. Supertagging solves a subset of the parsing task by assigning lexical types to words in a sentence using a sequence model. It has emerged as a way to improve the efficiency of full CCG parsing (Clark and Curran, 2007) by reducing the parser’s search space. This has been very successful and it is the central theme of this thesis. We begin by an analysis of how efficiency is being traded for accuracy in supertagging. Pruning the search space by supertagging is inherently approximate and to contrast this we include A* in our analysis, a classic exact search technique. Interestingly, we find that combining the two methods improves efficiency but we also demonstrate that excessive pruning by a supertagger significantly lowers the upper bound on accuracy of a CCG parser. Inspired by this analysis, we design a single integrated model with both supertagging and parsing features, rather than separating them into distinct models chained together in a pipeline. To overcome the resulting complexity, we experiment with both loopy belief propagation and dual decomposition approaches to inference, the first empirical comparison of these algorithms that we are aware of on a structured natural language processing problem. Finally, we address training the integrated model. We adopt the idea of optimising directly for a task-specific metric such as is common in other areas like statistical machine translation. We demonstrate how a novel dynamic programming algorithm enables us to optimise for F-measure, our task-specific evaluation metric, and experiment with approximations, which prove to be excellent substitutions. Each of the presented methods improves over the state-of-the-art in CCG parsing. Moreover, the improvements are additive, achieving a labelled/unlabelled dependency F-measure on CCGbank of 89.3%/94.0% with gold part-of-speech tags, and 87.2%/92.8% with automatic part-of-speech tags, the best reported results for this task to date. Our techniques are general and we expect them to apply to other parsing problems, including lexicalised tree adjoining grammar and context-free grammar parsing. 415.0285
7	Transition-based combinatory categorial grammar parsing for English and Hindi Ambati, Bharat Ram January 2016 (has links) Given a natural language sentence, parsing is the task of assigning it a grammatical structure, according to the rules within a particular grammar formalism. Different grammar formalisms like Dependency Grammar, Phrase Structure Grammar, Combinatory Categorial Grammar, Tree Adjoining Grammar are explored in the literature for parsing. For example, given a sentence like “John ate an apple”, parsers based on the widely used dependency grammars find grammatical relations, such as that ‘John’ is the subject and ‘apple’ is the object of the action ‘ate’. We mainly focus on Combinatory Categorial Grammar (CCG) in this thesis. In this thesis, we present an incremental algorithm for parsing CCG for two diverse languages: English and Hindi. English is a fixed word order, SVO (Subject-Verb- Object), and morphologically simple language, whereas, Hindi, though predominantly a SOV (Subject-Object-Verb) language, is a free word order and morphologically rich language. Developing an incremental parser for Hindi is really challenging since the predicate needed to resolve dependencies comes at the end. As previously available shift-reduce CCG parsers use English CCGbank derivations which are mostly right branching and non-incremental, we design our algorithm based on the dependencies resolved rather than the derivation. Our novel algorithm builds a dependency graph in parallel to the CCG derivation which is used for revealing the unbuilt structure without backtracking. Though we use dependencies for meaning representation and CCG for parsing, our revealing technique can be applied to other meaning representations like lambda expressions and for non-CCG parsing like phrase structure parsing. Any statistical parser requires three major modules: data, parsing algorithm and learning algorithm. This thesis is broadly divided into three parts each dealing with one major module of the statistical parser. In Part I, we design a novel algorithm for converting dependency treebank to CCGbank. We create Hindi CCGbank with a decent coverage of 96% using this algorithm. We also do a cross-formalism experiment where we show that CCG supertags can improve widely used dependency parsers. We experiment with two popular dependency parsers (Malt and MST) for two diverse languages: English and Hindi. For both languages, CCG categories improve the overall accuracy of both parsers by around 0.3-0.5% in all experiments. For both parsers, we see larger improvements specifically on dependencies at which they are known to be weak: long distance dependencies for Malt, and verbal arguments for MST. The result is particularly interesting in the case of the fast greedy parser (Malt), since improving its accuracy without significantly compromising speed is relevant for large scale applications such as parsing the web. We present a novel algorithm for incremental transition-based CCG parsing for English and Hindi, in Part II. Incremental parsers have potential advantages for applications like language modeling for machine translation and speech recognition. We introduce two new actions in the shift-reduce paradigm for revealing the required information during parsing. We also analyze the impact of a beam and look-ahead for parsing. In general, using a beam and/or look-ahead gives better results than not using them. We also show that the incremental CCG parser is more useful than a non-incremental version for predicting relative sentence complexity. Given a pair of sentences from wikipedia and simple wikipedia, we build a classifier which predicts if one sentence is simpler/complex than the other. We show that features from a CCG parser in general and incremental CCG parser in particular are more useful than a chart-based phrase structure parser both in terms of speed and accuracy. In Part III, we develop the first neural network based training algorithm for parsing CCG. We also study the impact of neural network based tagging models, and greedy versus beam-search parsing, by using a structured neural network model. In greedy settings, neural network models give significantly better results than the perceptron models and are also over three times faster. Using a narrow beam, structured neural network model gives consistently better results than the basic neural network model. For English, structured neural network gives similar performance to structured perceptron parser. But for Hindi, structured perceptron is still the winner. 410.285
8	Wide-coverage parsing for Turkish Çakici, Ruket January 2009 (has links) Wide-coverage parsing is an area that attracts much attention in natural language processing research. This is due to the fact that it is the first step tomany other applications in natural language understanding, such as question answering. Supervised learning using human-labelled data is currently the best performing method. Therefore, there is great demand for annotated data. However, human annotation is very expensive and always, the amount of annotated data is much less than is needed to train well-performing parsers. This is the motivation behind making the best use of data available. Turkish presents a challenge both because syntactically annotated Turkish data is relatively small and Turkish is highly agglutinative, hence unusually sparse at the whole word level. METU-Sabancı Treebank is a dependency treebank of 5620 sentences with surface dependency relations and morphological analyses for words. We show that including even the crudest forms of morphological information extracted from the data boosts the performance of both generative and discriminative parsers, contrary to received opinion concerning English. We induce word-based and morpheme-based CCG grammars from Turkish dependency treebank. We use these grammars to train a state-of-the-art CCG parser that predicts long-distance dependencies in addition to the ones that other parsers are capable of predicting. We also use the correct CCG categories as simple features in a graph-based dependency parser and show that this improves the parsing results. We show that a morpheme-based CCG lexicon for Turkish is able to solve many problems such as conflicts of semantic scope, recovering long-range dependencies, and obtaining smoother statistics from the models. CCG handles linguistic phenomena i.e. local and long-range dependencies more naturally and effectively than other linguistic theories while potentially supporting semantic interpretation in parallel. Using morphological information and a morpheme-cluster based lexicon improve the performance both quantitatively and qualitatively for Turkish. We also provide an improved version of the treebank which will be released by kind permission of METU and Sabancı. 300.285
9	An Examination Of Quantifier Scope Ambiguity In Turkish Kurt, Kursad 01 September 2006 (has links) (PDF) This study investigates the problem of quantifier scope ambiguity in natural languages and the various ways with which it has been accounted for, some of which are problematic for monotonic theories of grammar like Combinatory Categorial Grammar (CCG) which strive for solutions that avoid non-monotonic functional application, and assume complete transparency between the syntax and the semantics interface of a language. Another purpose of this thesis is to explore these proposals on examples from Turkish and to try to account for the meaning differences that may be caused by word order and see how the observations from Turkish fit within the framework of CCG. PL Ural-Altaic Languages 1-481
10	Grammatical Relations And Word Order In Turkish Sign Language (tid) Sevinc, Ayca Muge 01 April 2006 (has links) (PDF) This thesis aims at investigating the grammatical relations in Turkish Sign Language (TiD). For this aim, word order, nominal morphology, and agreement morphology of verbs are examined. TiD lacks morphological case, but it has a very rich pronominal system like other sign languages. Verbs are classified according to their morphosyntactic features. With this classification, we can observe the effect of word order and agreement morphology on the grammatical relations. Combinatory Categorial Grammar as a lexicalized grammar encodes word order, morphological case, and agreement features in the lexicon. Hence, it has the tools for testing any lexicalized basic word order hypothesis for a language based on the gapping data. Gapping data based on grammatical judgments of native signers indicate that TiD is a verb final language. Syntactic ergativity seems to be prevailing in coordination of a transitive sentence and an intransitive sentence where the single argument of the intransitive clause or one of the arguments of the transitive clause is missing. TiD also shows a tendency for ergativity in lexical properties such as agreement and pro-drop.

Search results