Recent years have seen a surge in the popularity of XML, a markup language for representing semi-structured data. Some of this popularity can be attributed to the success that the semi-structured data model has had in environments where the relational data model has been insufficiently expressive. Concomitant with XMLs growing popularity, the world of database research has seen the rebirth of interest in tree-structured, hierarchical database systems. This thesis analyzes several problems that arise when constructing XML data management systems, particularly in the case where such systems must handle dynamic content. In the first chapter, we consider the problem of incremental schema validation, which arises in almost any XML database system. We build upon previous work by finding several classes of schemas for which very efficient algorithms exist. We also develop an algorithm that works for any schema, and prove that it is optimal. In the second chapter, we turn to the problem of improving query evaluation times on extremely large database systems. In particular, we boost the performance of the structural and twig joins, fundamental XML query evaluation techniques, through the use of an adaptive index. This index tunes itself to the query workload, providing a 20-80% boost in speed for these join operators. The adaptive nature of the index also allows updates to the database to be easily tracked. While accurate selectivity estimation is a critical problem in any database system due to its importance in choosing optimal query plans, there has been very little work on selectivity estimation in the presence of updates. We ask whether it is possible to design a structure for selectivity in XML databases that is updateable, and can return results with theoretically sound error guarantees. Through a combination of lower and upper bounds, we give strong evidence suggesting that this is unlikely in practice. Motivated by these results, we then develop a heuristic selectivity estimation structure for XML databases. This structure is the first such synopsis that can handle all aspects of core XPath, and is also updateable. Our experimental results demonstrate the efficacy of the approach.
Identifer | oai:union.ndltd.org:ADTP/243109 |
Date | January 2007 |
Creators | Fisher, Damien Kaine, School of Computer Science & Engineering, UNSW |
Source Sets | Australiasian Digital Theses Program |
Language | English |
Detected Language | English |
Rights | http://unsworks.unsw.edu.au/copyright, http://unsworks.unsw.edu.au/copyright |
Page generated in 0.0025 seconds