Spelling suggestions: "subject:"query aprocessing"" "subject:"query eprocessing""
51 |
Understanding query quality in dynamic networksRajamani, Vasanth 09 December 2010 (has links)
With the proliferation of laptops, smart phones, sensors and other small devices, our physical environment is increasingly networked. Applications in a variety of problem domains (e.g., intelligent construction, traffic monitoring, smart homes, etc.) need to efficiently and seamlessly execute on top of such emerging infrastructure. Such infrastructure tends to be unreliable, and the network configuration changes constantly (network hosts depart and reemerge frequently). Consequently, software has to be able to react to these changes continuously and change its behaviors accordingly. In this dissertation, I introduce PAQ (Persistent Adaptive Query), a middleware designed to ease the programming burden associated with writing such applications. PAQ employs a novel style of query-driven application development that allows programmers to build pervasive applications by employing persistent queries--queries that continuously monitor the environment. The dissertation discusses the design and implementation of a new middleware model that allows programmers to write high level specifications abstracting away several tedious implementation details. PAQ employs both novel protocols that automatically tag the quality of information obtained from the network and statistical techniques to post-process and smooth the data. The goal of this research is to ease the software engineering challenges encountered during the construction and deployment of several applications in emerging pervasive computing environments thorough the use of a query-driven application development paradigm. / text
|
52 |
Efficient and Reliable In-Network Query Processing in Wireless Sensor NetworksMalhotra, Baljeet Singh Unknown Date
No description available.
|
53 |
Ranked Retrieval in Uncertain and Probabilistic DatabasesSoliman, Mohamed January 2011 (has links)
Ranking queries are widely used in data exploration, data analysis and decision
making scenarios. While most of the currently proposed ranking techniques focus
on deterministic data, several emerging applications involve data that are imprecise
or uncertain. Ranking uncertain data raises new challenges in query semantics and
processing, making conventional methods inapplicable. Furthermore, the interplay
between ranking and uncertainty models introduces new dimensions for ordering query
results that do not exist in the traditional settings.
This dissertation introduces new formulations and processing techniques for ranking queries on uncertain data. The formulations are based on marriage of traditional ranking semantics with possible worlds semantics under widely-adopted uncertainty models. In particular, we focus on studying the impact of tuple-level and attribute-level uncertainty on the semantics and processing techniques of ranking queries.
Under the tuple-level uncertainty model, we introduce a processing framework leveraging the capabilities of relational database systems to recognize and handle data
uncertainty in score-based ranking. The framework encapsulates a state space model,
and efficient search algorithms that compute query answers by lazily materializing the
necessary parts of the space. Under the attribute-level uncertainty model, we give a new probabilistic ranking model, based on partial orders, to encapsulate the space of possible rankings originating from uncertainty in attribute values. We present a set of efficient query evaluation algorithms, including sampling-based techniques based on the theory of Markov chains and Monte-Carlo method, to compute query answers.
We build on our techniques for ranking under attribute-level uncertainty to support
rank join queries on uncertain data. We show how to extend current rank join methods
to handle uncertainty in scoring attributes. We provide a pipelined query operator
implementation of uncertainty-aware rank join algorithm integrated with sampling
techniques to compute query answers.
|
54 |
Effective and Efficient Similarity Search in Video DatabasesJie Shao Unknown Date (has links)
Searching relevant information based on content features in video databases is an interesting and challenging research topic that has drawn lots of attention recently. Video similarity search has many practical applications such as TV broadcast monitoring, copyright compliance enforcement and search result clustering, etc. However, existing studies are limited to provide fast and accurate solutions due to the diverse variations among the videos in large collections. In this thesis, we introduce the database support for effective and efficient video similarity search from various sources, even if there exists some transformation distortion, partial content re-ordering, insertion, deletion or replacement. Specifically, we focus on processing two different types of content-based queries: video clip retrieval in a large collection of segmented short videos, and video subsequence identification from a long unsegmented stream. The first part of the thesis investigates the problem of how to process a number of individual kNN searches on the same database simultaneously to reduce the computational overhead of current content-based video search systems. We propose a Dynamic Query Ordering (DQO) algorithm for efficiently processing Batch Nearest Neighbor (BNN) search in high-dimensional space, with advanced optimizations of both I/O cost and CPU cost. The second part of the thesis challenges an unstudied problem of temporal localization of similar content from a long unsegmented video sequence, with extension to identify the occurrence of potentially different ordering or length with respect to query due to video content editing. A graph transformation and matching approach supported by the above BNN search is proposed, as a filter-and-refine query processing strategy to effectively but still efficiently identify the most similar subsequence. The third part of the thesis extends the method of Bounded Coordinate System (BCS) we introduced earlier for video clip retrieval. A novel collective perspective of exploiting the distributional discrepancy of samples for assessing the similarity between two video clips is presented. Several ideas of non-parametric hypothesis tests in statistics are utilized to check the hypothesis whether two ensembles of points are from a same distribution. The proposed similarity measures can provide a more comprehensive analysis that captures the essence of invariant distribution information for retrieving video clips. For each part, we demonstrate comprehensive experimental evaluations, which show improved performance compared with state-of-the-art methods. In the end, some scheduled extensions of this work are highlighted as future research objectives.
|
55 |
Efficient Processing of Skyline Queries on Static Data Sources, Data Streams and Incomplete DatasetsJanuary 2014 (has links)
abstract: Skyline queries extract interesting points that are non-dominated and help paint the bigger picture of the data in question. They are valuable in many multi-criteria decision applications and are becoming a staple of decision support systems.
An assumption commonly made by many skyline algorithms is that a skyline query is applied to a single static data source or data stream. Unfortunately, this assumption does not hold in many applications in which a skyline query may involve attributes belonging to multiple data sources and requires a join operation to be performed before the skyline can be produced. Recently, various skyline-join algorithms have been proposed to address this problem in the context of static data sources. However, these algorithms suffer from several drawbacks: they often need to scan the data sources exhaustively to obtain the skyline-join results; moreover, the pruning techniques employed to eliminate tuples are largely based on expensive tuple-to-tuple comparisons. On the other hand, most data stream techniques focus on single stream skyline queries, thus rendering them unsuitable for skyline-join queries.
Another assumption typically made by most of the earlier skyline algorithms is that the data is complete and all skyline attribute values are available. Due to this constraint, these algorithms cannot be applied to incomplete data sources in which some of the attribute values are missing and are represented by NULL values. There exists a definition of dominance for incomplete data, but this leads to undesirable consequences such as non-transitive and cyclic dominance relations both of which are detrimental to skyline processing.
Based on the aforementioned observations, the main goal of the research described in this dissertation is the design and development of a framework of skyline operators that effectively handles three distinct types of skyline queries: 1) skyline-join queries on static data sources, 2) skyline-window-join queries over data streams, and 3) strata-skyline queries on incomplete datasets. This dissertation presents the unique challenges posed by these skyline queries and addresses the shortcomings of current skyline techniques by proposing efficient methods to tackle the added overhead in processing skyline queries on static data sources, data streams, and incomplete datasets. / Dissertation/Thesis / Doctoral Dissertation Computer Science 2014
|
56 |
A Comparative Study of Dual-tree Algorithms for Computing Spatial Distance HistogramMou, Chengcheng 01 January 2015 (has links)
Particle simulation has become an important research technique in many scientific and engineering fields in latest years. However, these simulations will generate countless data, and database they required would therefore deal with very challenging tasks in terms of data management, storage, and query processing. The two-body correlation function (2-BCFs), a statistical learning measurement to evaluate the datasets, has been mainly utilized to measure the spatial distance histogram (SDH). By using a straightforward method, the process of SDH query takes quadratic time. Recently, a novel algorithm has been proposed to compute the SDH based on the concept of density map (DM), and it reduces the running time to ϴ(N(3/2)) for two-dimensional data and ϴ (N(5/3) ) for three-dimensional data, respectively. In the DM-SDH algorithm, there are two types of DMs that can be plugged in for computation: Quad-tree (Oct-tree for three-dimensional data) and k-d tree data structure. In this thesis paper, by using the geometric method, we prove the unre- solvable ratios on the k-d tree. Further, we analyze and compare the difference in the performance in each potential case generated by these DM-SDH algorithms. Experimental results confirm our analysis and show that the k-d tree structure has better performance in terms of time complexity in all cases. However, our qualitative analysis shows that the Quad-tree (Oct-tree) has an advantage over the k-d tree on aspect of space complexity.
|
57 |
Efficient query processing in managed runtimesNagel, Fabian Oliver January 2015 (has links)
This thesis presents strategies to improve the query evaluation performance over huge volumes of relational-like data that is stored in the memory space of managed applications. Storing and processing application data in the memory space of managed applications is motivated by the convergence of two recent trends in data management. First, dropping DRAM prices have led to memory capacities that allow the entire working set of an application to fit into main memory and to the emergence of in-memory database systems (IMDBs). Second, language-integrated query transparently integrates query processing syntax into programming languages and, therefore, allows complex queries to be composed in the application. IMDBs typically serve as data stores to applications written in an object-oriented language running on a managed runtime. In this thesis, we propose a deeper integration of the two by storing all application data in the memory space of the application and using language-integrated query, combined with query compilation techniques, to provide fast query processing. As a starting point, we look into storing data as runtime-managed objects in collection types provided by the programming language. Queries are formulated using language-integrated query and dynamically compiled to specialized functions that produce the result of the query in a more efficient way by leveraging query compilation techniques similar to those used in modern database systems. We show that the generated query functions significantly improve query processing performance compared to the default execution model for language-integrated query. However, we also identify additional inefficiencies that can only be addressed by processing queries using low-level techniques which cannot be applied to runtime-managed objects. To address this, we introduce a staging phase in the generated code that makes query-relevant managed data accessible to low-level query code. Our experiments in .NET show an improvement in query evaluation performance of up to an order of magnitude over the default language-integrated query implementation. Motivated by additional inefficiencies caused by automatic garbage collection, we introduce a new collection type, the black-box collection. Black-box collections integrate the in-memory storage layer of a relational database system to store data and hide the internal storage layout from the application by employing existing object-relational mapping techniques (hence, the name black-box). Our experiments show that black-box collections provide better query performance than runtime-managed collections by allowing the generated query code to directly access the underlying relational in-memory data store using low-level techniques. Black-box collections also outperform a modern commercial database system. By removing huge volumes of collection data from the managed heap, black-box collections further improve the overall performance and response time of the application and improve the application’s scalability when facing huge volumes of collection data. To enable a deeper integration of the data store with the application, we introduce self-managed collections. Self-managed collections are a new type of collection for managed applications that, in contrast to black-box collections, store objects. As the data elements stored in the collection are objects, they are directly accessible from the application using references which allows for better integration of the data store with the application. Self-managed collections manually manage the memory of objects stored within them in a private heap that is excluded from garbage collection. We introduce a special collection syntax and a novel type-safe manual memory management system for this purpose. As was the case for black-box collections, self-managed collections improve query performance by utilizing a database-inspired data layout and allowing the use of low-level techniques. By also supporting references between collection objects, they outperform black-box collections.
|
58 |
Performance Analysis of kNN Query Processing on large datasets using CUDA & Pthreads : comparing between CPU & GPUKalakuntla, Preetham January 2017 (has links)
Telecom companies do a lot of analytics to provide consumers a better service and to stay in competition. These companies accumulate special big data that has potential to provide inputs for business. Query processing is one of the major tool to fire analytics at their data. Traditional query processing techniques which follow in-memory algorithm cannot cope up with the large amount of data of telecom operators. The k nearest neighbour technique(kNN) is best suitable method for classification and regression of large datasets. Our research is focussed on implementation of kNN as query processing algorithm and evaluate the performance of it on large datasets using single core, multi-core and on GPU. This thesis shows an experimental implementation of kNN query processing on single core CPU, Multicore CPU and GPU using Python, P- threads and CUDA respectively. We considered different levels of sizes, dimensions and k as inputs to evaluate the performance. The experiment shows that GPU performs better than CPU single core on the order of 1.4 to 3 times and CPU multi-core on the order of 5.8 to 16 times for different levels of inputs.
|
59 |
Analysis of parallel scan processing in Shared Disk database systemsRahm, Erhard, Stöhr, Thomas 23 October 2018 (has links)
Shared Disk database systems offer a high flexibility for parallel transaction and query processing. This is because each node can process any transaction, query or subquery because it has access to the entire database. Compared to Shared Nothing database systems, this is particularly advantageous for scan queries for which the degree of intra-query parallelism as well as the scan processors themselves can dynamically be chosen. On the other hand, there is the danger of disk contention between subqueries, in particular for index scans. We present a detailed simulation study to analyze the effectiveness of parallel scan processing in Shared Disk database systems. In particular, we investigate the relationship between the degree of declustering and the degree of scan parallelism for relation scans, clustered index scans, and non-clustered index scans. Furthermore, we study the usefulness of disk caches and prefetching for limiting disk contention. Finally, we show that disk contention in multi-user mode can be limited for Shared Disk database systems by dynamically choosing the degree of scan parallelism.
|
60 |
Controlling Disk Contention for Parallel Query Processing in Shared Disk Database SystemsRahm, Erhard, Stöhr, Thomas 08 July 2019 (has links)
Shared Disk database systems offer a high flexibility for parallel transaction and query processing. This is because each node can process any transaction, query or subquery because it has access to the entire database. Compared to Shared Nothing, this is particularly advantageous for scan queries for which the degree of intra-query parallelism as well as the scan processors themselves can dynamically be chosen. On the other hand, there is the danger of disk contention between subqueries, in particular for index scans. We present a detailed simulation study to analyze the effectiveness of parallel scan processing in Shared Disk database systems. In particular, we investigate the relationship between the degree of declustering and the degree of scan parallelism for relation scans, clustered index scans, and non-clustered index scans. Furthermore, we study the usefulness of disk caches and prefetching for limiting disk contention. Finally, we show the importance of dynamically choosing the degree of scan parallelism to control disk contention in multi-user mode.
|
Page generated in 0.0983 seconds