Global ETD Search

351	Query Processing in a Traceable P2P Record Exchange Framework ISHIKAWA, Yoshiharu, LI, Fengrong 01 June 2010 (has links) No description available. query processing data provenance traceability peer-to-peer networks information exchange
352	Flexible techniques for heterogeneous XML data retrieval Sanz Blasco, Ismael 31 October 2007 (has links) The progressive adoption of XML by new communities of users has motivated the appearance of applications that require the management of large and complex collections, which present a large amount of heterogeneity. Some relevant examples are present in the fields of bioinformatics, cultural heritage, ontology management and geographic information systems, where heterogeneity is not only reflected in the textual content of documents, but also in the presence of rich structures which cannot be properly accounted for using fixed schema definitions. Current approaches for dealing with heterogeneous XML data are, however, mainly focused at the content level, whereas at the structural level only a limited amount of heterogeneity is tolerated; for instance, weakening the parent-child relationship between nodes into the ancestor-descendant relationship. The main objective of this thesis is devising new approaches for querying heterogeneous XML collections. This general objective has several implications: First, a collection can present different levels of heterogeneity in different granularity levels; this fact has a significant impact in the selection of specific approaches for handling, indexing and querying the collection. Therefore, several metrics are proposed for evaluating the level of heterogeneity at different levels, based on information-theoretical considerations. These metrics can be employed for characterizing collections, and clustering together those collections which present similar characteristics. Second, the high structural variability implies that query techniques based on exact tree matching, such as the standard XPath and XQuery languages, are not suitable for heterogeneous XML collections. As a consequence, approximate querying techniques based on similarity measures must be adopted. Within the thesis, we present a formal framework for the creation of similarity measures which is based on a study of the literature that shows that most approaches for approximate XML retrieval (i) are highly tailored to very specific problems and (ii) use similarity measures for ranking that can be expressed as ad-hoc combinations of a set of --basic' measures. Some examples of these widely used measures are tf-idf for textual information and several variations of edit distances. Our approach wraps these basic measures into generic, parametrizable components that can be combined into complex measures by exploiting the composite pattern, commonly used in Software Engineering. This approach also allows us to integrate seamlessly highly specific measures, such as protein-oriented matching functions.Finally, these measures are employed for the approximate retrieval of data in a context of highly structural heterogeneity, using a new approach based on the concepts of pattern and fragment. In our context, a pattern is a concise representations of the information needs of a user, and a fragment is a match of a pattern found in the database. A pattern consists of a set of tree-structured elements --- basically an XML subtree that is intended to be found in the database, but with a flexible semantics that is strongly dependent on a particular similarity measure. For example, depending on a particular measure, the particular hierarchy of elements, or the ordering of siblings, may or may not be deemed to be relevant when searching for occurrences in the database. Fragment matching, as a query primitive, can deal with a much higher degree of flexibility than existing approaches. In this thesis we provide exhaustive and top-k query algorithms. In the latter case, we adopt an approach that does not require the similarity measure to be monotonic, as all previous XML top-k algorithms (usually based on Fagin's algorithm) do. We also presents two extensions which are important in practical settings: a specification for the integration of the aforementioned techniques into XQuery, and a clustering algorithm that is useful to manage complex result sets.All of the algorithms have been implemented as part of ArHeX, a toolkit for the development of multi-similarity XML applications, which supports fragment-based queries through an extension of the XQuery language, and includes graphical tools for designing similarity measures and querying collections. We have used ArHeX to demonstrate the effectiveness of our approach using both synthetic and real data sets, in the context of a biomedical research project. similarity approximate query processing heterogeneous data management XML 004
353	Querying Large Collections of Semistructured Data Kamali, Shahab 05 September 2013 (has links) An increasing amount of data is published as semistructured documents formatted with presentational markup. Examples include data objects such as mathematical expressions encoded with MathML or web pages encoded with XHTML. Our intention is to improve the state of the art in retrieving, manipulating, or mining such data. We focus first on mathematics retrieval, which is appealing in various domains, such as education, digital libraries, engineering, patent documents, and medical sciences. Capturing the similarity of mathematical expressions also greatly enhances document classification in such domains. Unlike text retrieval, where keywords carry enough semantics to distinguish text documents and rank them, math symbols do not contain much semantic information on their own. Unfortunately, considering the structure of mathematical expressions to calculate relevance scores of documents results in ranking algorithms that are computationally more expensive than the typical ranking algorithms employed for text documents. As a result, current math retrieval systems either limit themselves to exact matches, or they ignore the structure completely; they sacrifice either recall or precision for efficiency. We propose instead an efficient end-to-end math retrieval system based on a structural similarity ranking algorithm. We describe novel optimization techniques to reduce the index size and the query processing time. Thus, with the proposed optimizations, mathematical contents can be fully exploited to rank documents in response to mathematical queries. We demonstrate the effectiveness and the efficiency of our solution experimentally, using a special-purpose testbed that we developed for evaluating math retrieval systems. We finally extend our retrieval system to accommodate rich queries that consist of combinations of math expressions and textual keywords. As a second focal point, we address the problem of recognizing structural repetitions in typical web documents. Most web pages use presentational markup standards, in which the tags control the formatting of documents rather than semantically describing their contents. Hence, their structures typically contain more irregularities than descriptive (data-oriented) markup languages. Even though applications would greatly benefit from a grammar inference algorithm that captures structure to make it explicit, the existing algorithms for XML schema inference, which target data-oriented markup, are ineffective in inferring grammars for web documents with presentational markup. There is currently no general-purpose grammar inference framework that can handle irregularities commonly found in web documents and that can operate with only a few examples. Although inferring grammars for individual web pages has been partially addressed by data extraction tools, the existing solutions rely on simplifying assumptions that limit their application. Hence, we describe a principled approach to the problem by defining a class of grammars that can be inferred from very small sample sets and can capture the structure of most web documents. The effectiveness of this approach, together with a comparison against various classes of grammars including DTDs and XSDs, is demonstrated through extensive experiments on web documents. We finally use the proposed grammar inference framework to extend our math retrieval system and to optimize it further. Mathematics retrieval Search Semistructured data Query Language XML Grammar inference Computer Science
354	XQuery Query Processing in Relational Systems Chen, Yingwen January 2004 (has links) With the rapid growth of XML documents to serve as a popular and major media for storage and interchange of the data on the Web, there is an increasing interest in using existing traditional relational database techniques to store and/or query XML data. Since XQuery is becoming a standard XML query language, significant effort has been made in developing an efficient and comprehensive XQuery-to-SQL query processor. In this thesis, we design and implement an <em>XQuery-to-SQL Query Processor</em> based on the <em>Dynamic Intervals</em> approach. We also provide a comprehensive translation for XQuery basic operations and FLWR expressions. The query processor is able to translate a complex XQuery query, which might include arbitrarily composed and nested FLWR expressions, basic functions, and element constructors, into a single SQL query for RDBMS and a physical plan for the <em>XQuery-enhanced Relational Engine</em>. In order to produce efficient and concise SQL queries, succinct XQuery to SQL translation templates and the optimization algorithms for the SQL query generation are proposed and implemented. The preferable <em>merge-join</em> approach is also proposed to avoid the inefficient <em>nested-loop</em> evaluation for FLWR expressions. <em>Merge-join</em> patterns and query rewriting rules are designed to identify XQuery fragments that can utilize the efficient <em>merge-join</em> evaluation. Proofs of correctness of the approach are provided in the thesis. Experimental results justify the correctness of our work. Computer Science XQuery SQL translation XML Query processor Dynamic Interval Encoding Merge-join
355	Formulating Complex Queries Using Templates Zhang, Hao 21 January 2009 (has links) While many users have relatively general information needs, users who are familiar with a certain topic may have more specific or complex information needs. Such users already have some knowledge of a subject and its concepts, and they need to find information on a specific aspect of a certain entity, such as its cause, effect, and relationships between entities. To successfully resolve this kind of complex information needs, in our study, we investigated the effectiveness of topic-independent query templates as a tool for assisting users in articulating their information needs. A set of query templates, which were written in the form of fill-in-the-blanks was designed to represent general semantic relationships between concepts, such as cause-effect and problem-solution. To conduct the research, we designed a control interface with a single query textbox and an experimental interface with the query templates. A user study was performed with 30 users. Okapi information retrieval system was used to retrieve documents in response to the users’ queries. The analysis in this paper indicates that while users found the template-based query formulation less easy to use, the queries written using templates performed better than the queries written using the control interface with one query textbox. Our analysis of a group of users and some specific topics demonstrates that the experimental interface tended to help users create more detailed search queries and the users were able to think about different aspects of their complex information needs and fill in many templates. In the future, an interesting research direction would be to tune the templates, adapting them to users’ specific query requests and avoiding showing non-relevant templates to users by automatically selecting related templates from a larger set of templates. Information Retrieval User Study Complex Information needs Query Templates Management Sciences
356	Formulating Complex Queries Using Templates Zhang, Hao 21 January 2009 (has links) While many users have relatively general information needs, users who are familiar with a certain topic may have more specific or complex information needs. Such users already have some knowledge of a subject and its concepts, and they need to find information on a specific aspect of a certain entity, such as its cause, effect, and relationships between entities. To successfully resolve this kind of complex information needs, in our study, we investigated the effectiveness of topic-independent query templates as a tool for assisting users in articulating their information needs. A set of query templates, which were written in the form of fill-in-the-blanks was designed to represent general semantic relationships between concepts, such as cause-effect and problem-solution. To conduct the research, we designed a control interface with a single query textbox and an experimental interface with the query templates. A user study was performed with 30 users. Okapi information retrieval system was used to retrieve documents in response to the users’ queries. The analysis in this paper indicates that while users found the template-based query formulation less easy to use, the queries written using templates performed better than the queries written using the control interface with one query textbox. Our analysis of a group of users and some specific topics demonstrates that the experimental interface tended to help users create more detailed search queries and the users were able to think about different aspects of their complex information needs and fill in many templates. In the future, an interesting research direction would be to tune the templates, adapting them to users’ specific query requests and avoiding showing non-relevant templates to users by automatically selecting related templates from a larger set of templates. Information Retrieval User Study Complex Information needs Query Templates Management Sciences
357	XQuery Query Processing in Relational Systems Chen, Yingwen January 2004 (has links) With the rapid growth of XML documents to serve as a popular and major media for storage and interchange of the data on the Web, there is an increasing interest in using existing traditional relational database techniques to store and/or query XML data. Since XQuery is becoming a standard XML query language, significant effort has been made in developing an efficient and comprehensive XQuery-to-SQL query processor. In this thesis, we design and implement an <em>XQuery-to-SQL Query Processor</em> based on the <em>Dynamic Intervals</em> approach. We also provide a comprehensive translation for XQuery basic operations and FLWR expressions. The query processor is able to translate a complex XQuery query, which might include arbitrarily composed and nested FLWR expressions, basic functions, and element constructors, into a single SQL query for RDBMS and a physical plan for the <em>XQuery-enhanced Relational Engine</em>. In order to produce efficient and concise SQL queries, succinct XQuery to SQL translation templates and the optimization algorithms for the SQL query generation are proposed and implemented. The preferable <em>merge-join</em> approach is also proposed to avoid the inefficient <em>nested-loop</em> evaluation for FLWR expressions. <em>Merge-join</em> patterns and query rewriting rules are designed to identify XQuery fragments that can utilize the efficient <em>merge-join</em> evaluation. Proofs of correctness of the approach are provided in the thesis. Experimental results justify the correctness of our work. Computer Science XQuery SQL translation XML Query processor Dynamic Interval Encoding Merge-join
358	Optimal Path Queries in Very Large Spatial Databases Zhang, Jie January 2005 (has links) Researchers have been investigating the optimal route query problem for a long time. Optimal route queries are categorized as either unconstrained or constrained queries. Many main memory based algorithms have been developed to deal with the optimal route query problem. Among these, Dijkstra's shortest path algorithm is one of the most popular algorithms for the unconstrained route query problem. The constrained route query problem is more complicated than the unconstrained one, and some constrained route query problems such as the Traveling Salesman Problem and Hamiltonian Path Problem are NP-hard. There are many algorithms dealing with the constrained route query problem, but most of them only solve a specific case. In addition, all of them require that the entire graph resides in the main memory. Recently, due to the need of applications in very large graphs, such as the digital maps managed by Geographic Information Systems (GIS), several disk-based algorithms have been derived by using divide-and-conquer techniques to solve the shortest path problem in a very large graph. However, until now little research has been conducted on the disk-based constrained problem. <br /><br /> This thesis presents two algorithms: 1) a new disk-based shortest path algorithm (DiskSPNN), and 2) a new disk-based optimal path algorithm (DiskOP) that answers an optimal route query without passing a set of forbidden edges in a very large graph. Both algorithms fit within the same divide-and-conquer framework as the existing disk-based shortest path algorithms proposed by Ning Zhang and Heechul Lim. Several techniques, including query super graph, successor fragment and open boundary node pruning are proposed to improve the performance of the previous disk-based shortest path algorithms. Furthermore, these techniques are applied to the DiskOP algorithm with minor changes. The proposed DiskOP algorithm depends on the concept of collecting a set of boundary vertices and simultaneously relaxing their adjacent super edges. Even if the forbidden edges are distributed in all the fragments of a graph, the DiskOP algorithm requires little memory. Our experimental results indicate that the DiskSPNN algorithm performs better than the original ones with respect to the I/O cost as well as the running time, and the DiskOP algorithm successfully solves a specific constrained route query problem in a very large graph. Computer Science Optimal route query very large spatial databases forbidden edge constraint
359	Formulating Complex Queries Using Templates Zhang, Hao 21 January 2009 (has links) While many users have relatively general information needs, users who are familiar with a certain topic may have more specific or complex information needs. Such users already have some knowledge of a subject and its concepts, and they need to find information on a specific aspect of a certain entity, such as its cause, effect, and relationships between entities. To successfully resolve this kind of complex information needs, in our study, we investigated the effectiveness of topic-independent query templates as a tool for assisting users in articulating their information needs. A set of query templates, which were written in the form of fill-in-the-blanks was designed to represent general semantic relationships between concepts, such as cause-effect and problem-solution. To conduct the research, we designed a control interface with a single query textbox and an experimental interface with the query templates. A user study was performed with 30 users. Okapi information retrieval system was used to retrieve documents in response to the users’ queries. The analysis in this paper indicates that while users found the template-based query formulation less easy to use, the queries written using templates performed better than the queries written using the control interface with one query textbox. Our analysis of a group of users and some specific topics demonstrates that the experimental interface tended to help users create more detailed search queries and the users were able to think about different aspects of their complex information needs and fill in many templates. In the future, an interesting research direction would be to tune the templates, adapting them to users’ specific query requests and avoiding showing non-relevant templates to users by automatically selecting related templates from a larger set of templates. Information Retrieval User Study Complex Information needs Query Templates Management Sciences
360	Formulating Complex Queries Using Templates Zhang, Hao 21 January 2009 (has links) While many users have relatively general information needs, users who are familiar with a certain topic may have more specific or complex information needs. Such users already have some knowledge of a subject and its concepts, and they need to find information on a specific aspect of a certain entity, such as its cause, effect, and relationships between entities. To successfully resolve this kind of complex information needs, in our study, we investigated the effectiveness of topic-independent query templates as a tool for assisting users in articulating their information needs. A set of query templates, which were written in the form of fill-in-the-blanks was designed to represent general semantic relationships between concepts, such as cause-effect and problem-solution. To conduct the research, we designed a control interface with a single query textbox and an experimental interface with the query templates. A user study was performed with 30 users. Okapi information retrieval system was used to retrieve documents in response to the users’ queries. The analysis in this paper indicates that while users found the template-based query formulation less easy to use, the queries written using templates performed better than the queries written using the control interface with one query textbox. Our analysis of a group of users and some specific topics demonstrates that the experimental interface tended to help users create more detailed search queries and the users were able to think about different aspects of their complex information needs and fill in many templates. In the future, an interesting research direction would be to tune the templates, adapting them to users’ specific query requests and avoiding showing non-relevant templates to users by automatically selecting related templates from a larger set of templates. Information Retrieval User Study Complex Information needs Query Templates Management Sciences

Search results