• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 508
  • 79
  • 36
  • 29
  • 22
  • 15
  • 11
  • 10
  • 9
  • 8
  • 6
  • 6
  • 5
  • 4
  • 3
  • Tagged with
  • 872
  • 286
  • 265
  • 221
  • 202
  • 169
  • 152
  • 134
  • 130
  • 129
  • 125
  • 117
  • 103
  • 102
  • 102
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
611

Efficient Approximate OLAP Querying Over Time Series

Perera, Kasun S., Hahmann, Martin, Lehner, Wolfgang, Pedersen, Torben Bach, Thomsen, Christian 15 June 2023 (has links)
The ongoing trend for data gathering not only produces larger volumes of data, but also increases the variety of recorded data types. Out of these, especially time series, e.g. various sensor readings, have attracted attention in the domains of business intelligence and decision making. As OLAP queries play a major role in these domains, it is desirable to also execute them on time series data. While this is not a problem on the conceptual level, it can become a bottleneck with regards to query run-time. In general, processing OLAP queries gets more computationally intensive as the volume of data grows. This is a particular problem when querying time series data, which generally contains multiple measures recorded at fine time granularities. Usually, this issue is addressed either by scaling up hardware or by employing workload based query optimization techniques. However, these solutions are either costly or require continuous maintenance. In this paper we propose an approach for approximate OLAP querying of time series that offers constant latency and is maintenance-free. To achieve this, we identify similarities between aggregation cuboids and propose algorithms that eliminate the redundancy these similarities present. In doing so, we can achieve compression rates of up to 80% while maintaining low average errors in the query results.
612

CROSS-DB: a feature-extended multidimensional data model for statistical and scientific databases

Lehner, Wolfgang, Ruf, Thomas, Teschke, Michael 13 September 2022 (has links)
Statistical and scientific computing applications exhibit characteristics that are fundamentally different from classical database system application domains. The CROSS-DB data model presented in this paper is optimized for use in such applications by providing advanced data modelling methods and application-oriented query facilities, thus providing a framework for optimized data management procedures. CROSS-DB (which stands for Classification-oriented, Redundancy-based Optimization of Statistical and Scientific DataBases) is based on a multidimensional data view. The model differs from other approaches by o~ering two complementary rnechanisrnsfor structuring qualifying information, classification and feature description. Using these mechanisms results in a normalized, low-dimensional database schema which ensures both, modelling uniqueness and understandability while providing enhanced modelling flexibility.
613

Static MySQL Error Checking

Zarinkhail, Mohammad Shuaib January 2010 (has links)
Masters of Science / Coders of databases repeatedly face the problem of checking their Structured Query Language (SQL) code. Instructors face the difficulty of checking student projects and lab assignments in database courses. We collect and categorize common MySQL programming errors into three groups: data definition errors, data manipulation errors, and transaction control errors. We build these into a comprehensive list of MySQL errors, which novices are inclined make during database programming. We collected our list of common MySQL errors both from the technical literature and directly by noting errors made in assignments handed in by students. In the results section of this research, we check and summarize occurrences of these errors based on three characteristics as semantics, syntax, and logic. These data form the basis of a future static MySQL checker that will eventually assist database coders to correct their code automatically. These errors also form a useful checklist to guide students away from the mistakes that they are prone to make.
614

Query Processing on Prefix Trees Live

Kissinger, Thomas, Schlegel, Benjamin, Habich, Dirk, Lehner, Wolfgang 17 August 2022 (has links)
Modern database systems have to process huge amounts of data and should provide results with low latency at the same time. To achieve this, data is nowadays typically hold completely in main memory, to benefit of its high bandwidth and low access latency that could never be reached with disks. Current in-memory databases are usually column-stores that exchange columns or vectors between operators and suffer from a high tuple reconstruction overhead. In this demonstration proposal, we present DexterDB, which implements our novel prefix tree-based processing model that makes indexes the first-class citizen of the database system. The core idea is that each operator takes a set of indexes as input and builds a new index as output that is indexed on the attribute requested by the successive operator. With that, we are able to build composed operators, like the multi-way-select-join-group. Such operators speed up the processing of complex OLAP queries so that DexterDB outperforms state-of-the-art in-memory databases. Our demonstration focuses on the different optimization options for such query plans. Hence, we built an interactive GUI that connects to a DexterDB instance and allows the manipulation of query optimization parameters. The generated query plans and important execution statistics are visualized to help the visitor to understand our processing model.
615

An Approach For Computing Intervisibility Using Graphical Processing U

Tracy, Judd 01 January 2004 (has links)
In large scale entity-level military force-on-force simulations it is essential to know when one entity can visibly see another entity. This visibility determination plays an important role in the simulation and can affect the outcome of the simulation. When virtual Computer Generated Forces (CGF) are introduced into the simulation these intervisibilities must now be calculated by the virtual entities on the battlefield. But as the simulation size increases so does the complexity of calculating visibility between entities. This thesis presents an algorithm for performing these visibility calculations using Graphical Processing Units (GPU) instead of the Central Processing Units (CPU) that have been traditionally used in CGF simulations. This algorithm can be distributed across multiple GPUs in a cluster and its scalability exceeds that of CGF-based algorithms. The poor correlations of the two visibility algorithms are demonstrated showing that the GPU algorithm provides a necessary condition for a "Fair Fight" when paired with visual simulations.
616

Scene Understanding For Real Time Processing Of Queries Over Big Data Streaming Video

Aved, Alexander 01 January 2013 (has links)
With heightened security concerns across the globe and the increasing need to monitor, preserve and protect infrastructure and public spaces to ensure proper operation, quality assurance and safety, numerous video cameras have been deployed. Accordingly, they also need to be monitored effectively and efficiently. However, relying on human operators to constantly monitor all the video streams is not scalable or cost effective. Humans can become subjective, fatigued, even exhibit bias and it is difficult to maintain high levels of vigilance when capturing, searching and recognizing events that occur infrequently or in isolation. These limitations are addressed in the Live Video Database Management System (LVDBMS), a framework for managing and processing live motion imagery data. It enables rapid development of video surveillance software much like traditional database applications are developed today. Such developed video stream processing applications and ad hoc queries are able to "reuse" advanced image processing techniques that have been developed. This results in lower software development and maintenance costs. Furthermore, the LVDBMS can be intensively tested to ensure consistent quality across all associated video database applications. Its intrinsic privacy framework facilitates a formalized approach to the specification and enforcement of verifiable privacy policies. This is an important step towards enabling a general privacy certification for video surveillance systems by leveraging a standardized privacy specification language. With the potential to impact many important fields ranging from security and assembly line monitoring to wildlife studies and the environment, the broader impact of this work is clear. The privacy framework protects the general public from abusive use of surveillance technology; iii success in addressing the "trust" issue will enable many new surveillance-related applications. Although this research focuses on video surveillance, the proposed framework has the potential to support many video-based analytical applications.
617

Learning Techniques For Information Retrieval And Mining In High-dimensional Databases

Cheng, Hao 01 January 2009 (has links)
The main focus of my research is to design effective learning techniques for information retrieval and mining in high-dimensional databases. There are two main aspects in the retrieval and mining research: accuracy and efficiency. The accuracy problem is how to return results which can better match the ground truth, and the efficiency problem is how to evaluate users' requests and execute learning algorithms as fast as possible. However, these problems are non-trivial because of the complexity of the high-level semantic concepts, the heterogeneous natures of the feature space, the high dimensionality of data representations and the size of the databases. My dissertation is dedicated to addressing these issues. Specifically, my work has five main contributions as follows. The first contribution is a novel manifold learning algorithm, Local and Global Structures Preserving Projection (LGSPP), which defines salient low-dimensional representations for the high-dimensional data. A small number of projection directions are sought in order to properly preserve the local and global structures for the original data. Specifically, two groups of points are extracted for each individual point in the dataset: the first group contains the nearest neighbors of the point, and the other set are a few sampled points far away from the point. These two point sets respectively characterize the local and global structures with regard to the data point. The objective of the embedding is to minimize the distances of the points in each local neighborhood and also to disperse the points far away from their respective remote points in the original space. In this way, the relationships between the data in the original space are well preserved with little distortions. The second contribution is a new constrained clustering algorithm. Conventionally, clustering is an unsupervised learning problem, which systematically partitions a dataset into a small set of clusters such that data in each cluster appear similar to each other compared with those in other clusters. In the proposal, the partial human knowledge is exploited to find better clustering results. Two kinds of constraints are integrated into the clustering algorithm. One is the must-link constraint, indicating that the involved two points belong to the same cluster. On the other hand, the cannot-link constraint denotes that two points are not within the same cluster. Given the input constraints, data points are arranged into small groups and a graph is constructed to preserve the semantic relations between these groups. The assignment procedure makes a best effort to assign each group to a feasible cluster without violating the constraints. The theoretical analysis reveals that the probability of data points being assigned to the true clusters is much higher by the new proposal, compared to conventional methods. In general, the new scheme can produce clusters which can better match the ground truth and respect the semantic relations between points inferred from the constraints. The third contribution is a unified framework for partition-based dimension reduction techniques, which allows efficient similarity retrieval in the high-dimensional data space. Recent similarity search techniques, such as Piecewise Aggregate Approximation (PAA), Segmented Means (SMEAN) and Mean-Standard deviation (MS), prove to be very effective in reducing data dimensionality by partitioning dimensions into subsets and extracting aggregate values from each dimension subset. These partition-based techniques have many advantages including very efficient multi-phased pruning while being simple to implement. They, however, are not adaptive to different characteristics of data in diverse applications. In this study, a unified framework for these partition-based techniques is proposed and the issue of dimension partitions is examined in this framework. An investigation of the relationships of query selectivity and the dimension partition schemes discovers indicators which can predict the performance of a partitioning setting. Accordingly, a greedy algorithm is designed to effectively determine a good partitioning of data dimensions so that the performance of the reduction technique is robust with regard to different datasets. The fourth contribution is an effective similarity search technique in the database of point sets. In the conventional model, an object corresponds to a single vector. In the proposed study, an object is represented by a set of points. In general, this new representation can be used in many real-world applications and carries much more local information, but the retrieval and learning problems become very challenging. The Hausdorff distance is the common distance function to measure the similarity between two point sets, however, this metric is sensitive to outliers in the data. To address this issue, a novel similarity function is defined to better capture the proximity of two objects, in which a one-to-one mapping is established between vectors of the two objects. The optimal mapping minimizes the sum of distances between each paired points. The overall distance of the optimal matching is robust and has high retrieval accuracy. The computation of the new distance function is formulated into the classical assignment problem. The lower-bounding techniques and early-stop mechanism are also proposed to significantly accelerate the expensive similarity search process. The classification problem over the point-set data is called Multiple Instance Learning (MIL) in the machine learning community in which a vector is an instance and an object is a bag of instances. The fifth contribution is to convert the MIL problem into a standard supervised learning in the conventional vector space. Specially, feature vectors of bags are grouped into clusters. Each object is then denoted as a bag of cluster labels, and common patterns of each category are discovered, each of which is further reconstructed into a bag of features. Accordingly, a bag is effectively mapped into a feature space defined by the distances from this bag to all the derived patterns. The standard supervised learning algorithms can be applied to classify objects into pre-defined categories. The results demonstrate that the proposal has better classification accuracy compared to other state-of-the-art techniques. In the future, I will continue to explore my research in large-scale data analysis algorithms, applications and system developments. Especially, I am interested in applications to analyze the massive volume of online data.
618

Relevance Analysis for Document Retrieval

Labouve, Eric 01 March 2019 (has links) (PDF)
Document retrieval systems recover documents from a dataset and order them according to their perceived relevance to a user’s search query. This is a difficult task for machines to accomplish because there exists a semantic gap between the meaning of the terms in a user’s literal query and a user’s true intentions. Even with this ambiguity that arises with a lack of context, users still expect that the set of documents returned by a search engine is both highly relevant to their query and properly ordered. The focus of this thesis is on document retrieval systems that explore methods of ordering documents from unstructured, textual corpora using text queries. The main goal of this study is to enhance the Okapi BM25 document retrieval model. In doing so, this research hypothesizes that the structure of text inside documents and queries hold valuable semantic information that can be incorporated into the Okapi BM25 model to increase its performance. Modifications that account for a term’s part of speech, the proximity between a pair of related terms, the proximity of a term with respect to its location in a document, and query expansion are used to augment Okapi BM25 to increase the model’s performance. The study resulted in 87 modifications which were all validated using open source corpora. The top scoring modification from the validation phase was then tested under the Lisa corpus and the model performed 10.25% better than Okapi BM25 when evaluated under mean average precision. When compared against two industry standard search engines, Lucene and Solr, the top scoring modification largely outperforms these systems by upwards to 21.78% and 23.01%, respectively.
619

Evaluation of PR-tree Window Query Performance : Under Modification By Heuristic Update Algorithms / Utvärdering av prestanda för fönstersökning i PR-träd : Under modifikation av heuristiska uppdateringsalgoritmer

Kratz, Jakob January 2024 (has links)
Spatial data arises in applications such as geographical information systems, computer aided design and computer vision. A practical indexing method for spatial data is the R-tree [1]. A common query to an R-tree is a window query which outputs all spatial objects that intersect a rectangular region in space. The PR-tree is the first R-tree variant where window query performance is asymptotically optimal in the worst case. In this work a PR-tree is updated using algorithms defined by Antonin Guttman [2] and Beckmann et al. [3], respectively and query performance is evaluated. The conclusion is that the R*-tree algorithms by Beckmann et al. [3] is superior to the algorithms by Antonin Guttman [2] for maintaining good query performance while updating a PR-tree / Spatial data är vanligt förekommande i geografiska informationssystem, CAD och datorseende. Ett praktiskt index för spatial data är ett R-träd [1]. En vanlig sökfråga till ett R-träd är en så kallad window query som ges ett rektangulärt område och returnerar alla spatiala objekt som skär detta område. PR-trädet är det första R-trädet med asymptotiskt optimal tidskomplexitet för en window query. I detta arbete används PR-trädet som bas och det modifieras med respektiva algoritmer definerade av Antonin Guttman [2] och Beckmann m. fl. [3] och prestanda för rektangulära sökfrågor utvärderas. Slutsatsen är att om målet är att bibehålla bra prestanda för sökfrågor medan PR-trädet modifieras ska algoritmerna av Beckmann m. fl. [3] föredras.
620

Teaching In-Memory Database Systems the Detection of Hardware Errors

Lehner, Wolfgang, Habich, Dirk, Kolditz, Till 18 January 2023 (has links)
The key objective of database systems is to reliably manage data, whereby high query throughput and low query latency are core requirements. To satisfy these requirements, database systems constantly adapt to novel hardware features. Although it has been intensively studied and commonly accepted that hardware error rates in terms of bit flips increase dramatically with the decrease of the underlying chip structures, most database system research activities neglected this fact, leaving error (bit flip) detection as well as correction to the underlying hardware. Especially for main memory, silent data corruption (SDC) as a result of transient bit flips leading to faulty data is mainly detected and corrected at the DRAM and memory-controller layer. However, since future hardware becomes less reliable and error detection as well as correction by hardware becomes more expensive, this free ride will come to an end in the near future. To further provide a reliable data management, an emerging research direction is employing specific and tailored protection techniques at the database system level. Following that, we are currently developing and implementing an adopted system design for state-of-the-art in-memory column stores. In our lightning talk, we will summarize our current state and outline future work.

Page generated in 0.0447 seconds