The analysis of human communication, in all its forms, increasingly depends on large collections of texts and transcribed recordings. These collections, or corpora, are often richly annotated with structural information. These datasets are extremely large so manual analysis is only successful up to a point. As such, significant effort has recently been invested in automatic techniques for extracting and analyzing these massive data sets. However, further progress on analytical tools is confronted by three major challenges. First, we need the right data model. Second, we need to understand the theoretical foundations of query languages on that data model. Finally, we need to know the expressive requirements for general purpose query language with respect to linguistics. This thesis has addressed all three of these issues. / Specifically, this thesis studies formalisms used by linguists and database theorists to describe tree structured data. Specifically, Propositional dynamic logic and monadic second-order logic. These formalisms have been used to reason about a number of tree querying languages and their applicability to the linguistic tree query problem. We identify a comprehensive set of linguistic tree query requirements and the level of expressiveness needed to implement them. The main result of this study is that the required level of expressiveness of linguistic tree query is that of the first-order predicate calculus over trees. / This formal approach has resulted in a convergence between two seemingly disparate fields of study. Further work in the intersection of linguistics and database theory should also pave the way for theoretically well-founded future work in this area. This, in turn, will lead to better tools for linguistic analysis and data management, and more comprehensive theories of human language.
Identifer | oai:union.ndltd.org:ADTP/245453 |
Creators | Lai, Catherine |
Source Sets | Australiasian Digital Theses Program |
Language | English |
Detected Language | English |
Rights | Terms and Conditions: Copyright in works deposited in the University of Melbourne Eprints Repository (UMER) is retained by the copyright owner. The work may not be altered without permission from the copyright owner. Readers may only, download, print, and save electronic copies of whole works for their own personal non-commercial use. Any use that exceeds these limits requires permission from the copyright owner. Attribution is essential when quoting or paraphrasing from these works., Open Access |
Page generated in 0.0014 seconds