• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 506
  • 79
  • 36
  • 29
  • 22
  • 14
  • 11
  • 10
  • 9
  • 8
  • 6
  • 6
  • 5
  • 4
  • 3
  • Tagged with
  • 867
  • 285
  • 262
  • 221
  • 200
  • 168
  • 151
  • 132
  • 129
  • 127
  • 124
  • 116
  • 103
  • 101
  • 101
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
51

Case for holistic query evaluation

Krikellas, Konstantinos January 2010 (has links)
In this thesis we present the holistic query evaluation model. We propose a novel query engine design that exploits the characteristics of modern processors when queries execute inside main memory. The holistic model (a) is based on template-based code generation for each executed query, (b) uses multithreading to adapt to multicore processor architectures and (c) addresses the optimization problem of scheduling multiple threads for intra-query parallelism. Main-memory query execution is a usual operation in modern database servers equipped with tens or hundreds of gigabytes of RAM. In such an execution environment, the query engine needs to adapt to the CPU characteristics to boost performance. For this purpose, holistic query evaluation applies customized code generation to database query evaluation. The idea is to use a collection of highly efficient code templates and dynamically instantiate them to create query- and hardware-specific source code. The source code is compiled and dynamically linked to the database server for processing. Code generation diminishes the bloat of higher-level programming abstractions necessary for implementing generic, interpreted, SQL query engines. At the same time, the generated code is customized for the hardware it will run on. The holistic model supports the most frequently used query processing algorithms, namely sorting, partitioning, join evaluation, and aggregation, thus allowing the efficient evaluation of complex DSS or OLAP queries. Modern CPUs follow multicore designs with multiple threads running in parallel. The dataflow of query engine algorithms needs to be adapted to exploit such designs. We identify memory accesses and thread synchronization as the main bottlenecks in a multicore execution environment. We extend the holistic query evaluation model and propose techniques to mitigate the impact of these bottlenecks on multithreaded query evaluation. We analytically model the expected performance and scalability of the proposed algorithms according to the hardware specifications. The analytical performance expressions can be used by the optimizer to statically estimate the speedup of multithreaded query execution. Finally, we examine the problem of thread scheduling in the context of multithreaded query evaluation on multicore CPUs. The search space for possible operator execution schedules scales fast, thus forbidding the use of exhaustive techniques. We model intra-query parallelism on multicore systems and present scheduling heuristics that result in different degrees of schedule quality and optimization cost. We identify cases where each of our proposed algorithms, or combinations of them, are expected to generate schedules of high quality at an acceptable running cost.
52

Mining and Managing Neighbor-Based Patterns in Data Streams

Yang, Di 09 January 2012 (has links)
The current data-intensive world is continuously producing huge volumes of live streaming data through various kinds of electronic devices, such as sensor networks, smart phones, GPS and RFID systems. To understand these data sources and thus better leverage them to serve human society, the demands for mining complex patterns from these high speed data streams have significantly increased in a broad range of application domains, such as financial analysis, social network analysis, credit fraud detection, and moving object monitoring. In this dissertation, we present a framework to tackle the mining and management problem for the family of neighbor-based patterns in data streams, which covers a broad range of popular pattern types, including clusters, outliers, k-nearest neighbors and others. First, we study the problem of efficiently executing single neighbor-based pattern mining queries. We propose a general optimization principle for incremental pattern maintenance in data streams, called "Predicted Views". This general optimization principle exploits the "predictability" of sliding window semantics to eliminate both the computational and storage effort needed for handling the expiration of stream objects, which usually constitutes the most expensive operations for incremental pattern maintenance. Second, the problem of multiple query optimization for neighbor-based pattern mining queries is analyzed, which aims to efficiently execute a heavy workload of neighbor-based pattern mining queries using shared execution strategies. We present an integrated pattern maintenance strategy to represent and incrementally maintain the patterns identified by queries with different query parameters within a single compact structure. Our solution realizes fully shared execution of multiple queries with arbitrary parameter settings. Third, the problem of summarization and matching for neighbor-based patterns is examined. To solve this problem, we first propose a summarization format for each pattern type. Then, we present computation strategies, which efficiently summarize the neighbor-based patterns either during or after the online pattern extraction process. Lastly, to compare patterns extracted on different time horizon of the stream, we design an efficient matching mechanism to identify similar patterns in the stream history for any given pattern of interest to an analyst. Our comprehensive experimental studies, using both synthetic as well as real data from domains of stock trades and moving object monitoring, demonstrate superiority of our proposed strategies over alternate methods in both effectiveness and efficiency.
53

Extending Event Sequence Processing:New Models and Optimization Techniques

Liu, Mo 25 April 2012 (has links)
Many modern applications, including online financial feeds, tag-based mass transit systems and RFID-based supply chain management systems transmit real-time data streams. There is a need for event stream processing technology to analyze this vast amount of sequential data to enable online operational decision making. This dissertation focuses on innovating several techniques at the core of a scalable E-Analytic system to achieve efficient, scalable and robust methods for in-memory multi-dimensional nested pattern analysis over high-speed event streams. First, I address the problem of processing flat pattern queries on event streams with out-of-order data arrival. I design two alternate solutions: aggressive and conservative strategies respectively. The aggressive strategy produces maximal output under the optimistic assumption that out-of-order event arrival is rare. The conservative method works under the assumption that out-of-order data may be common, and thus produces output only when its correctness can be guaranteed. Second, I design the integration of CEP and OLAP techniques (ECube model) for efficient multi-dimensional event pattern analysis at different abstraction levels. Strategies of drill-down (refinement from abstract to specific patterns) and of roll-up (generalization from specific to abstract patterns) are developed for the efficient workload evaluation. I design a cost-driven adaptive optimizer called Chase that exploits reuse strategies for optimal E-Cube hierarchy execution. Then, I explore novel optimization techniques to support the high- performance processing of powerful nested CEP patterns. A CEP query language called NEEL, is designed to express nested CEP pattern queries composed of sequence, negation, AND and OR operators. To allow flexible execution ordering, I devise a normalization procedure that employs rewriting rules for flattening a nested complex event expression. To conserve CPU and memory consumption, I propose several strategies for efficient shared processing of groups of normalized NEEL subexpressions. Our comprehensive experimental studies, using both synthetic as well as real data streams demonstrate superiority of our proposed strategies over alternate methods in the literature in both effectiveness and efficiency.
54

Butterfly: A Model of Provenance

Tang, Yaobin 13 March 2009 (has links)
Semantically rich metadata is foreseen to be pervasive in tomorrow's cyber world. People are more willing to store metadata in the hope that such extra information will enable a wide range of novel business intelligent applications. Provenance is metadata which describes the derivation history of data. It is considered to have great potential for helping the reasoning, analyzing, validating, monitoring, integrating and reusing of data. Although there are a few application-specific systems equipped with some degree of provenance tracking functionality, few formal models of provenance are present. A general purpose, formal model of provenance is desirable not only to widely promote the storage and inventive usage of provenance, but also to prepare for the emergence of so called provenance management system. In this thesis, I propose Butterfly, a general purpose provenance model, which offers the capability to model, store, and query provenance. It consists of a semantic model for describing provenance, and an extensible algebraic query model for querying provenance. An initial implementation of the provenance model is also briefly discussed.
55

Cost-based optimization of graph queries in relational database management systems

Trissl, Silke 14 June 2012 (has links)
Graphen sind in vielen Bereichen des Lebens zu finden, wobei wir speziell an Graphen in der Biologie interessiert sind. Knoten in solchen Graphen sind chemische Komponenten, Enzyme, Reaktionen oder Interaktionen, die durch Kanten miteinander verbunden sind. Eine effiziente Ausführung von Graphanfragen ist eine Herausforderung. In dieser Arbeit präsentieren wir GRIcano, ein System, das die effiziente Ausführung von Graphanfragen erlaubt. Wir nehmen an, dass Graphen in relationalen Datenbankmanagementsystemen (RDBMS) gespeichert sind. Als Graphanfragesprache schlagen wir eine erweiterte Version der Pathway Query Language (PQL) vor. Der Hauptbestandteil von GRIcano ist ein kostenbasierter Anfrageoptimierer. Diese Arbeit enthält Beiträge zu allen drei benötigten Komponenten des Optimierers, der relationalen Algebra, Implementierungen und Kostenmodellen. Die Operatoren der relationalen Algebra sind nicht ausreichend, um Graphanfragen auszudrücken. Daher stellen wir zuerst neue Operatoren vor. Wir schlagen den Erreichbarkeits-, Distanz-, Pfadlängen- und Pfadoperator vor. Zusätzlich geben wir Regeln für die Umformung von Ausdrücken an. Des Weiteren präsentieren wir Implementierungen für jeden vorgeschlagenen Operator. Der Hauptbeitrag ist GRIPP, eine Indexstruktur, die die effiziente Ausführung von Erreichbarkeitsanfragen auf sehr großen Graphen erlaubt. Wir zeigen, wie GRIPP und die rekursive Anfragestrategie genutzt werden können, um Implementierungen für alle Operatoren bereitzustellen. Die dritte Komponente von GRIcano ist das Kostenmodell, das Kardinalitätsabschätzungen der Operatoren und Kostenfunktionen für die Implementierungen benötigt. Basierend auf umfangreichen Experimenten schlagen wir in dieser Arbeit Funktionen dafür vor. Der neue Ansatz unserer Kostenmodelle ist, dass die Funktionen nur Kennzahlen der Graphen verwenden. Abschließend zeigen wir die Wirkungsweise von GRIcano durch Beispielanfragen auf echten biologischen Graphen. / Graphs occur in many areas of life. We are interested in graphs in biology, where nodes are chemical compounds, enzymes, reactions, or interactions that are connected by edges. Efficiently querying these graphs is a challenging task. In this thesis we present GRIcano, a system that efficiently executes graph queries. For GRIcano we assume that graphs are stored and queried using relational database management systems (RDBMS). We propose an extended version of the Pathway Query Language PQL to express graph queries. The core of GRIcano is a cost-based query optimizer. This thesis makes contributions to all three required components of the optimizer, the relational algebra, implementations, and cost model. Relational algebra operators alone are not sufficient to express graph queries. Thus, we first present new operators to rewrite PQL queries to algebra expressions. We propose the reachability, distance, path length, and path operator. In addition, we provide rewrite rules for the newly proposed operators in combination with standard relational algebra operators. Secondly, we present implementations for each proposed operator. The main contribution is GRIPP, an index structure that allows us to answer reachability queries on very large graphs. GRIPP has advantages over other existing index structures, which we review in this work. In addition, we show how to employ GRIPP and the recursive query strategy as implementation for all four proposed operators. The third component of GRIcano is the cost model, which requires cardinality estimates for operators and cost functions for implementations. Based on extensive experimental evaluation of our proposed algorithms we present functions to estimate the cardinality of operators and the cost of executing a query. The novelty of our approach is that these functions only use key figures of the graph. We finally present the effectiveness of GRIcano using exemplary graph queries on real biological networks.
56

Probabilistic threshold range aggregate query processing over uncertain data

Yang, Shuxiang, Computer Science & Engineering, Faculty of Engineering, UNSW January 2009 (has links)
Uncertainty is inherent in many novel and important applications such as market surveillance, information extraction sensor data analysis, etc. In the recent a few decades, uncertain data has attracted considerable research attention. There are various factors that cause the uncertainty, for instance randomness or incompleteness of data, limitations of equipment and delay or loss in data transfer. A probabilistic threshold range aggregate (PRTA) query retrieves summarized information about the uncertain objects in the database satisfying a range query, with respect to a given probability threshold. This thesis is trying to address and handle this important type of query which there is no previous work studying on. We formulate the problem in both discrete and continuous uncertain data model and develop a novel index structure, asU-tree (aggregate-based sampling-auxiliary U-tree) which not only supports exact query answering but also provides approximate results with accuracy guarantee if efficiency is more concerned. The new asU-tree structure is totally dynamic. Query processing algorithms for both exact answer and approximate answer based on this new index structure are also proposed. An extensive experimental study shows that asU-tree is very efficient and effective over real and synthetic datasets.
57

Is Semantic Query Optimization Worthwhile?

Genet, Bryan Howard January 2007 (has links)
The term quote semantic query optimization quote (SQO) denotes a methodology whereby queries against databases are optimized using semantic information about the database objects being queried. The result of semantically optimizing a query is another query which is syntactically different to the original, but semantically equivalent and which may be answered more efficiently than the original. SQO is distinctly different from the work performed by the conventional SQL optimizer. The SQL optimizer generates a set of logically equivalent alternative execution paths based ultimately on the rules of relational algebra. However, only a small proportion of the readily available semantic information is utilised by current SQL optimizers. Researchers in SQO agree that SQO can be very effective. However, after some twenty years of research into SQO, there is still no commercial implementation. In this thesis we argue that we need to quantify the conditions for which SQO is worthwhile. We investigate what these conditions are and apply this knowledge to relational database management systems (RDBMS) with static schemas and infrequently updated data. Any semantic query optimizer requires the ability to reason using the semantic information available, in order to draw conclusions which ultimately facilitate the recasting of the original query into a form which can be answered more efficiently. This reasoning engine is currently not part of any commercial RDBMS implementation. We show how a practical semantic query optimizer may be built utilising readily available semantic information, much of it already captured by meta-data typically stored in commercial RDBMS. We develop cost models which predict an upper bound to the amount of optimization one can expect when queries are pre-processed by a semantic optimizer. We present a series of empirical results to confirm the effectiveness or otherwise of various types of SQO and demonstrate the circumstances under which SQO can be effective.
58

Effective retrieval techniques for Arabic text

Nwesri, Abdusalam F Ahmad, nwesri@yahoo.com January 2008 (has links)
Arabic is a major international language, spoken in more than 23 countries, and the lingua franca of the Islamic world. The number of Arabic-speaking Internet users has grown over nine-fold in the Middle East between the year 2000 and 2007, yet research in Arabic Information Retrieval (AIR) has not advanced as in other languages such as English. In this thesis, we explore techniques that improve the performance of AIR systems. Stemming is considered one of the most important factors to improve retrieval effectiveness of AIR systems. Most current stemmers remove affixes without checking whether the removed letters are actually affixes. We propose lexicon-based improvements to light stemming that distinguish core letters from proper Arabic affixes. We devise rules to stem most affixes and show their effects on retrieval effectiveness. Using the TREC 2001 test collection, we show that applying relevance feedback with our rules produces significantly better results than light stemming. Techniques for Arabic information retrieval have been studied in depth on clean collections of newswire dispatches. However, the effectiveness of such techniques is not known on other noisy collections in which text is generated using automatic speech recognition (ASR) systems and queries are generated using machine translations (MT). Using noisy collections, we show that normalisation, stopping and light stemming improve results as in normal text collections but that n-grams and root stemming decrease performance. Most recent AIR research has been undertaken using collections that are far smaller than the collections used for English text retrieval; consequently, the significance of some published results is debatable. Using the LDC Arabic GigaWord collection that contains more than 1 500 000 documents, we create a test collection of~90 topics with their relevance judgements. Using this test collection, we show empirically that for a large collection, root stemming is not competitive. Of the approaches we have studied, lexicon-based stemming approaches perform better than light stemming approaches alone. Arabic text commonly includes foreign words transliterated into Arabic characters. Several transliterated forms may be in common use for a single foreign word, but users rarely use more than one variant during search tasks. We test the effectiveness of lexicons, Arabic patterns, and n-grams in distinguishing foreign words from native Arabic words. We introduce rules that help filter foreign words and improve the n-gram approach used in language identification. Our combined n-grams and lexicon approach successfully identifies 80% of all foreign words with a precision of 93%. To find variants of a specific foreign word, we apply phonetic and string similarity techniques and introduce novel algorithms to normalise them in Arabic text. We modify phonetic techniques used for English to suit the Arabic language, and compare several techniques to determine their effectiveness in finding foreign word variants. We show that our algorithms significantly improve recall. We also show that expanding queries using variants identified by our Soutex4 phonetic algorithm results in a significant improvement in precision and recall. Together, the approaches described in this thesis represent an important step towards realising highly effective retrieval of Arabic text.
59

Querying Mediated Web Services

Sabesan, Manivasakan January 2007 (has links)
<p>Web services provide a framework for data interchange between applications by incorporating standards such as XMLSchema, WSDL, SOAP, HTTP etc. They define operations to be invoked over a network to perform the actions. These operations are described publicly in a WSDL document with the data types of their argument and result. Searching data accessible via web services is essential in many applications. However, web services don’t provide any general query language or view capabilities. Current web services applications to access the data must be developed using a regular programming language such Java, or C#. The thesis provides an approach to simplify querying web services data and proposes efficient processing of database queries to views of wrapped web services. To show the effectiveness of the approach, a prototype, <em>webService MEDiator system (WSMED</em>), is developed. WSMED provides general view and query capabilities over data accessible through web services by automatically extracting basic meta-data from WSDL descriptions. Based on imported meta-data, the user can then define views that extract data from the results of calls to web service operations. The views can be queried using SQL. A given view can access many different web service operations in different ways depending on what view attributes are known. The views can be specified in terms of several declarative queries to be applied by the query processor. In addition, the user can provide semantic enrichments of the meta-data with key constraints to enable efficient query execution over the views by automatic query transformations. We evaluated the effectiveness of our approach over multilevel views of existing web services and show that the key constraint enrichments substantially improve query performance.</p> / SIDA
60

Using Controlled Natural Language for World Knowledge Reasoning

Dellis, Nelson Charles 01 January 2010 (has links)
Search engines are the most popular tools for finding answers to questions, but unfortunately they do not always provide complete direct answers. Answers often need to be extracted by the user, from the web pages returned by the search engine. This research addresses this problem, and shows how an automated theorem prover, combined with existing ontologies and the web, is able to reason about world knowledge and return direct answers to users' questions. The use of an automated theorem prover also allows more complex questions to be asked. Automated theorem provers that exhibit these capabilities are called World Knowledge Reasoning systems. This research discusses one such system, the CNL-WKR system. The CNL-WKR system uses the ACE controlled natural language as its user-input language. It then calls upon external sources on the web, as well as internal ontological sources, during the theorem proving process, in order to find answers. The system uses the automated theorem prover, SPASS-XDB. The result is a system that is capable of answering complex questions about the world.

Page generated in 0.0459 seconds