Global ETD Search

Return to search

Generative and Computational Power of Combinatory Categorial Grammar

Combinatory categorial grammar (CCG) is a mildly-context sensitive formalism that is well-established in computational linguistics. At the basis of the grammar are a lexicon and a rule system: The lexicon assigns syntactic categories to the symbols of a given input string, and the rule system specifies how adjacent categories can be combined, yielding a derivation tree whose nodes are labeled by categories. In this thesis, we focus on composition rules, which are present in all variants of the grammar.

Vijay-Shanker and Weir famously show that CCG can generate the same class of string languages as tree-adjoining grammar, linear indexed grammar, and head grammar. Their equivalence proof relies on two particular features of the grammar: ε-entries, which are lexicon entries for the empty word, and rule restrictions, which allow to restrict the rule set on a per-grammar basis. However, modern variants of CCG tend to avoid these features. This raises the question how this changes the generative and computational power of CCG. Another important feature is the rule degree, which determines how complex a certain category involved in a rule application may be. The goal of this thesis is to shed light on the effects that changing these features has.

When modeling natural language, one is not only interested in the acceptability of a sentence, but also in its underlying structure. Therefore, we study the sets of constituency trees that CCG can generate, which are obtained by relabeling sets of derivation trees. We first provide a new proof of an analogous result by Buszkowski, showing that when only application rules are allowed, a proper subset of regular tree languages can be generated by CCG. Then, we show that when composition of first degree is included, CCG can generate exactly the regular tree languages. On the other hand, pure CCG, which allows all rules up to some degree, is shown to not even generate all local tree languages. Our main result on the generative capacity of CCG is its strong equivalence to tree-adjoining grammar. This means that these formalisms can generate the same class of tree languages. This is even the case when only composition rules of second degree and no ε-entries are used, showing that a CCG with these properties already has its full expressive power. Our constructions also provide an effective procedure for the removal of ε-entries.

Regarding computational complexity, ε-entries and high rule degrees are in fact problematic. Kuhlmann, Satta, and Jonsson studied the universal recognition problem for CCG, which asks whether some given string is generated by some given grammar, considering both as part of the input. They prove that this problem is EXPTIME-complete if ε-entries are included, and NP-complete if not. We refine this result and show that the runtime is exponential only in the maximum rule degree of the grammar. Hence, when the rule degree is bounded by a constant, parsing becomes polynomial in the grammar size. This also holds when substitution rules are included in the rule system.

info:eu-repo/classification/ddc/500

ddc:500

Identifer	oai:union.ndltd.org:DRESDEN/oai:qucosa:de:qucosa:93291
Date	13 August 2024
Creators	Schiffer, Lena Katharina
Contributors	Maletti, Andreas, Steedman, Mark, Maletti, Andreas, Universität Leipzig
Source Sets	Hochschulschriftenserver (HSSS) der SLUB Dresden
Language	English
Detected Language	English
Type	info:eu-repo/semantics/publishedVersion, doc-type:doctoralThesis, info:eu-repo/semantics/doctoralThesis, doc-type:Text
Rights	info:eu-repo/semantics/openAccess

Page generated in 0.0069 seconds

Generative and Computational Power of Combinatory Categorial Grammar

Description

Links & Downloads

Tags

Additional Fields