Spelling suggestions: "subject:"keyword queda""
1 |
Keyword Join: Realizing Keyword Search for Information IntegrationYu, Bei, Liu, Ling, Ooi, Beng Chin, Tan, Kian Lee 01 1900 (has links)
Information integration has been widely addressed over the last several decades. However, it is far from solved due to the complexity of resolving schema and data heterogeneities. In this paper, we propose out attempt to alleviate such difficulty by realizing keyword search functionality for integrating information from heterogeneous databases. Our solution does not require predefined global schema or any mappings between databases. Rather, it relies on an operator called keyword join to take a set of lists of partial answers from different data sources as input, and output a list of results that are joined by the tuples from input lists based on predefined similarity measures as integrated results. Our system allows source databases remain autonomous and the system to be dynamic and extensible. We have tested our system with real dataset and benchmark, which shows that our proposed method is practical and effective. / Singapore-MIT Alliance (SMA)
|
2 |
Keyword Join: Realizing Keyword Search in P2P-based Database SystemsYu, Bei, Liu, Ling, Ooi, Beng Chin, Tan, Kian Lee 01 1900 (has links)
In this paper, we present a P2P-based database sharing system that provides information sharing capabilities through keyword-based search techniques. Our system requires neither a global schema nor schema mappings between different databases, and our keyword-based search algorithms are robust in the presence of frequent changes in the content and membership of peers. To facilitate data integration, we introduce keyword join operator to combine partial answers containing different keywords into complete answers. We also present an efficient algorithm that optimize the keyword join operations for partial answer integration. Our experimental study on both real and synthetic datasets demonstrates the effectiveness of our algorithms, and the efficiency of the proposed query processing strategies. / Singapore-MIT Alliance (SMA)
|
3 |
OPEN—Enabling Non-expert Users to Extract, Integrate, and Analyze Open DataBraunschweig, Katrin, Eberius, Julian, Thiele, Maik, Lehner, Wolfgang 27 January 2023 (has links)
Government initiatives for more transparency and participation have lead to an increasing amount of structured data on the web in recent years. Many of these datasets have great potential. For example, a situational analysis and meaningful visualization of the data can assist in pointing out social or economic issues and raising people’s awareness. Unfortunately, the ad-hoc analysis of this so-called Open Data can prove very complex and time-consuming, partly due to a lack of efficient system support.On the one hand, search functionality is required to identify relevant datasets. Common document retrieval techniques used in web search, however, are not optimized for Open Data and do not address the semantic ambiguity inherent in it. On the other hand, semantic integration is necessary to perform analysis tasks across multiple datasets. To do so in an ad-hoc fashion, however, requires more flexibility and easier integration than most data integration systems provide. It is apparent that an optimal management system for Open Data must combine aspects from both classic approaches. In this article, we propose OPEN, a novel concept for the management and situational analysis of Open Data within a single system. In our approach, we extend a classic database management system, adding support for the identification and dynamic integration of public datasets. As most web users lack the experience and training required to formulate structured queries in a DBMS, we add support for non-expert users to our system, for example though keyword queries. Furthermore, we address the challenge of indexing Open Data.
|
4 |
Exploratory Ad-Hoc Analytics for Big DataEberius, Julian, Thiele, Maik, Lehner, Wolfgang 19 July 2023 (has links)
In a traditional relational database management system, queries can only be defined over attributes defined in the schema, but are guaranteed to give single, definitive answer structured exactly as specified in the query. In contrast, an information retrieval system allows the user to pose queries without knowledge of a schema, but the result will be a top-k list of possible answers, with no guarantees about the structure or content of the retrieved documents. In this chapter, we present Drill Beyond, a novel IR/RDBMS hybrid system, in which the user seamlessly queries a relational database together with a large corpus of tables extracted from a web crawl. The system allows full SQL queries over a relational database, but additionally enables the user to use arbitrary additional attributes in the query that need not to be defined in the schema. The system then processes this semi-specified query by computing a top-k list of possible query evaluations, each based on different candidate web data sources, thus mixing properties of two worlds RDBMS and IR systems.
|
Page generated in 0.0683 seconds