Return to search

Generative and Computational Power of Combinatory Categorial Grammar

Combinatory categorial grammar (CCG) is a mildly-context sensitive formalism that is well-established in computational linguistics. At the basis of the grammar are a lexicon and a rule system: The lexicon assigns syntactic categories to the symbols of a given input string, and the rule system specifies how adjacent categories can be combined, yielding a derivation tree whose nodes are labeled by categories. In this thesis, we focus on composition rules, which are present in all variants of the grammar.

Vijay-Shanker and Weir famously show that CCG can generate the same class of string languages as tree-adjoining grammar, linear indexed grammar, and head grammar. Their equivalence proof relies on two particular features of the grammar: ε-entries, which are lexicon entries for the empty word, and rule restrictions, which allow to restrict the rule set on a per-grammar basis. However, modern variants of CCG tend to avoid these features. This raises the question how this changes the generative and computational power of CCG. Another important feature is the rule degree, which determines how complex a certain category involved in a rule application may be. The goal of this thesis is to shed light on the effects that changing these features has.

When modeling natural language, one is not only interested in the acceptability of a sentence, but also in its underlying structure. Therefore, we study the sets of constituency trees that CCG can generate, which are obtained by relabeling sets of derivation trees. We first provide a new proof of an analogous result by Buszkowski, showing that when only application rules are allowed, a proper subset of regular tree languages can be generated by CCG. Then, we show that when composition of first degree is included, CCG can generate exactly the regular tree languages. On the other hand, pure CCG, which allows all rules up to some degree, is shown to not even generate all local tree languages. Our main result on the generative capacity of CCG is its strong equivalence to tree-adjoining grammar. This means that these formalisms can generate the same class of tree languages. This is even the case when only composition rules of second degree and no ε-entries are used, showing that a CCG with these properties already has its full expressive power. Our constructions also provide an effective procedure for the removal of ε-entries.

Regarding computational complexity, ε-entries and high rule degrees are in fact problematic. Kuhlmann, Satta, and Jonsson studied the universal recognition problem for CCG, which asks whether some given string is generated by some given grammar, considering both as part of the input. They prove that this problem is EXPTIME-complete if ε-entries are included, and NP-complete if not. We refine this result and show that the runtime is exponential only in the maximum rule degree of the grammar. Hence, when the rule degree is bounded by a constant, parsing becomes polynomial in the grammar size. This also holds when substitution rules are included in the rule system.

Identiferoai:union.ndltd.org:DRESDEN/oai:qucosa:de:qucosa:93291
Date13 August 2024
CreatorsSchiffer, Lena Katharina
ContributorsMaletti, Andreas, Steedman, Mark, Maletti, Andreas, Universität Leipzig
Source SetsHochschulschriftenserver (HSSS) der SLUB Dresden
LanguageEnglish
Detected LanguageEnglish
Typeinfo:eu-repo/semantics/publishedVersion, doc-type:doctoralThesis, info:eu-repo/semantics/doctoralThesis, doc-type:Text
Rightsinfo:eu-repo/semantics/openAccess

Page generated in 0.0069 seconds