Global ETD Search

Return to search

Accelerating data retrieval steps in XML documents

The aim of this research is to accelerate the data retrieval steps in a collection of XML (eXtensible Markup Language) documents, a key task of current XML research. The following three inter-connected issues relating to the state-of-theart XML research are thus studied: semantically clustering XML documents, efficiently querying XML document with an index structure and self-adaptively labelling dynamic XML documents, which form a basic but self-contained foundation of a native XML database system. This research is carried out by following a divide-and-conquer strategy. The issue of dividing a collection of XML documents into sub-clusters, in which semantically similar XML documents are grouped together, is addressed at first. To achieve this purpose, a semantic component model to model the implicit semantic of an XML document is proposed. This model enables us to devise a set of heuristic algorithms to' compute the degree of similarity among XML documents. In particular, the newly proposed semantic component model and the heuristic algorithms reflect the inaccuracy of the traditional edit-distance-based clustering mechanisms. After similar XML documents are grouped into sub-collections,the problem of querying XML documents with an index structure is carefully studied. A novel geometric sequence model is proposed to transform XML documents into numbered geometric sequences and XPath queries into geometric query sequences. The problem of evaluating an XPath query in an XML document is theoretically proved to be equal to the problem of finding the subsequence .matchings of a geometric query sequence in a numbered geometric document sequence. This geometric sequence model then enables us to devise two new stackbased algorithms to perform both top-down and bottom-up XPath evaluation in XML documents. In particular, the algorithms treat an XPath query as a whole unit, avoiding resource-consuming join operations and generating all the answers without semantic errors and false alarms. Finally the issue of supporting update functions in XML documents is tackled. A new Bayesian allocation model is introduced for the index structure generated in geometric sequence model. Based on k-ary tree data structure and the level traversal mechanism, the correctness and efficiency of the Bayesian allocation model in supporting dynamic XML documents is theoretically proved. In particular, the Bayesian allocation model is general and can be applied to most of the current index structures.

http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.419754

005.72

Computer science

Identifer	oai:union.ndltd.org:bl.uk/oai:ethos.bl.uk:419754
Date	January 2005
Creators	Shen, Yun
Contributors	Wang, Bing
Publisher	University of Hull
Source Sets	Ethos UK
Detected Language	English
Type	Electronic Thesis or Dissertation
Source	http://hydra.hull.ac.uk/resources/hull:8310

Page generated in 0.0022 seconds

Accelerating data retrieval steps in XML documents

Description

Links & Downloads

Tags

Additional Fields