Return to search

Keyword search in relational databases. / CUHK electronic theses & dissertations collection

In this thesis, for the schema-based approaches, we propose an efficient algorithm to general all relational algebra expressions in order to find all the connected trees in an RDB. We also study an efficient algorithm to evaluate all the expressions using semijoins in RDBMS . We show that our method can also be extended to answer continuous keyword queries in a relational data stream. We further propose novel algorithms that find sets of tuples that are reachable from a root tuple within a radius, and algorithms that find multi-center subgraphs within a radius. Our algorithms use SQL queries only in order to make fully use of RDBMS. We show that the current commercial RDBMSs are powerful enough to support such keyword queries in RDBs efficiently without any additional new indexing to be built and maintained. The main idea behind our approach is tuple reduction. For the graph-based approaches, we propose an efficient algorithm to find all/top- K multi-center subgraphs in polynomial delay. We also introduce a new kind of keyword query, namely, structural statistics by keywords, to summarize keyword search results into several dimensions. We conducted extensive performance studies using two large real datasets IMDB and DBLP to show the efficiency and effectiveness of all our approaches. / Keyword search in relational databases (RDBs) has been extensively studied recently. A keyword search (or a keyword query) in RDBs is specified by a set of keywords to explore the interconnected tuple structures in an RDB that cannot be easily identified using SQL on RDBMSs. In brief, it finds how the tuples containing the given keywords are connected via sequences of connections (foreign key references) among tuples in an RDB. Such interconnected tuple structures can be found as connected trees up to a certain size, sets of tuples that are reachable from a root tuple within a radius, or even multi-center subgraphs within a radius. In the literature, there are two main approaches, namely schema-based approaches and graph-based approaches. The schema-based approaches are to generate a set of relational algebra expressions and evaluate every such expression using SQL on an RDBMS directly or in a middleware on top of an RDBMS indirectly. Due to a large number of relational algebra expressions needed to process, most of the existing works take a middleware approach without fully utilizing RDBMSs. The graph-based approaches are to materialize an RDB as a graph and find the interconnected tuple structures using graph-based algorithms in memory. / Qin, Lu. / Adviser: Jeffrey Xu Yu. / Source: Dissertation Abstracts International, Volume: 73-02, Section: B, page: . / Thesis (Ph.D.)--Chinese University of Hong Kong, 2010. / Includes bibliographical references (leaves 133-138). / Electronic reproduction. Hong Kong : Chinese University of Hong Kong, [2012] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Electronic reproduction. [Ann Arbor, MI] : ProQuest Information and Learning, [201-] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Abstract also in Chinese.

Identiferoai:union.ndltd.org:cuhk.edu.hk/oai:cuhk-dr:cuhk_344657
Date January 2010
ContributorsQin, Lu, Chinese University of Hong Kong Graduate School. Division of Systems Engineering and Engineering Management.
Source SetsThe Chinese University of Hong Kong
LanguageEnglish, Chinese
Detected LanguageEnglish
TypeText, theses
Formatelectronic resource, microform, microfiche, 1 online resource (xiii, 138 leaves : ill.)
RightsUse of this resource is governed by the terms and conditions of the Creative Commons “Attribution-NonCommercial-NoDerivatives 4.0 International” License (http://creativecommons.org/licenses/by-nc-nd/4.0/)

Page generated in 0.0021 seconds