1 |
A Novel Verification Scheme for Fine-Grained Top-k Queries in Two-Tiered Sensor NetworksMa, X., Song, H., Wang, J., Gao, J., Min, Geyong January 2014 (has links)
No / A two-tiered architecture with resource-rich master nodes at the upper tier and resource-poor sensor nodes at the lower tier is expected to be adopted in large scale sensor networks. In a hostile environment, adversaries are more motivated to compromise the master nodes to break the authenticity and completeness of query results, whereas it is lack of light and secure query processing protocol in tiered sensor networks at present. In this paper, we study the problem of verifiable fine-grained top- queries in two-tiered sensor networks, and propose a novel verification scheme, which is named Verification Scheme for Fine-grained Top- Queries (VSFTQ). To make top- query results verifiable, VSFTQ establishes relationships among data items of each sensor node using their orders, which are encrypted together with the scores of the data items and the interested time epoch number using distinct symmetric keys kept by each sensor node and the network owner. Both theoretical analysis and simulation results show that VSFTQ can not only ensure high probability of detecting forged and/or incomplete query results, but also significantly decrease the amount of verification information when compared with existing schemes.
|
2 |
Ερωτήματα συνένωσης και βαθμολογημένης συνένωσης σε κατανεμημένα συστήματαΠατλάκας, Ιωάννης 28 February 2013 (has links)
Η ανάπτυξη των peer-to-peer βάσεων δεδομένων και η δυναμική εισαγωγή των συστημάτων αποθήκευσης σε νέφη υπολογιστών (cloudstores) ως τα κυρίαρχα μεγάλης κλίμακας συστήματα διαχείρισης δεδομένων, έχουν οδηγήσει τους ερευνητές να εξετάσουν το πρόβλημα της
υποστήριξης πολύπλοκων ερωτημάτων με ένα πλήρως αποκεντρωμένο τρόπο. Περίπλοκα ερωτήματα επιλογής (select), συνένωσης join, καθώς και βαθμολογημένα ερωτήματα έχουν κεντρίσει το ενδιαφέρον της κοινότητας διαχείρισης δεδομένων.
Ανάμεσα στις τάξεις των ερωτημάτων αυτών είναι το κεντρικής σημασίας top-k join. To κατανεμημένο top-k join, δεν έχει μελετηθεί επαρκώς, αν και συναντάται πολύ συχνά σε πραγματικό φόρτο εργασίας σε πολλά εμπορικά και άλλα συστήματα βάσεων δεδομένων.
Με την εργασία αυτή αντιμετωπίζουμε τέτοιου είδους ερωτήματα πάνω σε δεδομένα
που είναι κατανεμημένα σε ένα μεγάλου κλίμακας δίκτυο.
Οι συνεισφορές μας με αυτήν την εργασία περιλαμβάνουν: (α) ένα νέο κατανεμημένο ευρετήριο,
που επιτρέπει την πρόσβαση σε πλειάδες με τυχαίο και
διατεταγμένο τρόπο, (β) ένα σύνολο αλγόριθμων για βαθμολογημένα ερωτημάτατα συνένωσης join. Οι αλγόριθμοί μας
στηρίζονται στην προσαρμογή γνωστών αλγοριθμών κατωφλίου για βαθμολογημένο join
σε κατανεμημένο περιβάλλον, (γ) μία νέα χρήση των Bloom φίλτρων και ιστογραμμάτων για την περαιτέρω μείωση του εύρους ζώνης
που καταναλώνουν οι παραπάνω αλγόριθμοι, καθώς και απόδειξη για το ότι οι
αλγόριθμοί μας που βασίζονται σε φίλτρα Bloom και ιστογράμματα παράγουν το σωστό top-k αποτέλεσμα, (δ) μια σε βάθος συζήτηση του
σχεδιασμού των αλγορίθμων μας και θεμάτων που συνδέονται με τις επιδόσεις και τα trade-offs.
Επιπλέον διερευνούμε την αποτελεσματικότητα και την ποιότητα των προτεινόμενων λύσεων
μέσα από μία αναλυτική πειραματική αξιολόγηση, δείχνοντας τις περιπτώσεις που ο κάθε αλγόριθμός μας είναι κατάλληλος σε
μαζικώς κατανεμημένα και αποκεντρωμένα περιβάλλοντα, ενώ τονίζουμε τα trade-offs που προκύπτουν. / The advent of peer-to-peer databases and the recent rise of cloudstores as key large-scale data management
paradigms, have led researchers to look into the problem of
supporting complex queries in a fully decentralized manner.
Among the classes of queries considered in related centralized
work, there is one that stands out as largely overlooked in widely
distributed settings, albeit very common in real-world workloads:
top-k joins. With this work we tackle such queries over data
distributed across an internet-scale network.
Our contributions include: (a) a novel distributed indexing
scheme, allowing access to tuples in both a random and an
ordered manner; (b) a set of query processing algorithms based
on a novel adaptation of rank-join and threshold algorithms,
appropriate for use in a distributed environment; (c) a novel use
of Bloom Filters and histograms to further reduce the bandwidth
consumption of the above algorithms; a proof that ensures that
our algorithms based on Bloom filters and histograms produce
the correct top-k results; and (d) an in-depth discussion of the
design space and related performance trade-offs. We further
investigate the efficiency and quality of the proposed solutions
through an elaborate experimental evaluation, showcasing their
appropriateness for widely-distributed and massively decentralized environments and highlighting related trade-offs.
|
3 |
Evaluation de requêtes top-k continues à large-échelle / Continuous top-k queries over real-time web streamsVouzoukidou, Despoina 17 September 2015 (has links)
Dans cette thèse, nous nous intéressons à l'évaluation efficace de requêtes top-k continues sur des flux d'informations textuelles avec des feedbacks utilisateurs. La première contribution est une généralisation des modèles de requêtes top-k continues proposés dans l'état de l'art. Cette généralisation est fondée sur une famille des scores non-homogènes définis comme une combinaison linéaire de scores d'importance de l'information (indépendants des requêtes) et de scores de pertinence du contenu avec une décroissance continue de score reflétant la fraîcheur de l'information. La deuxième contribution est la définition et la mise en ¿uvre de structures de données en mémoire pour l'indexation et l'évaluation de cette nouvelle famille de requêtes top-k continues. Nos expériences montrent que notre solution est évolutive et, limitées aux fonctions homogènes, surpasse les performances d'autres solutions. Dans la deuxième partie de cette thèse, nous considérons le problème de l'intégration des signaux de feedback à notre famille de scores non-homogènes. Nous proposons un nouveau cadre général pour l'évaluation de ces requêtes du "web en temps réel" (real-time web queries) avec un ensemble d'algorithmes minimisant le coût d'évaluation d'un signal de feedback utilisateur dynamique sur un item d'information. Enfin, nous présentons MeowsReader, notre prototype de recommandation d'actualités qui intègre l'ensemble des résultats obtenus et illustre comment une classe générale de requêtes continues top-k propose une abstraction appropriée pour la modélisation et le filtrage continu d'information sur le web "temps-réel". / In this thesis, we are interested in efficient evaluation techniques of continuous top-k queries over text and feedback streams featuring generalized scoring functions which capture dynamic ranking aspects. As a first contribution, we generalize state of the art continuous top-k query models, by introducing a general family of non-homogeneous scoring functions combining query-independent item importance with query-dependent content relevance and continuous score decay reflecting information freshness. Our second contribution consists in the definition and implementation of efficient in-memory data structures for indexing and evaluating this new family of continuous top-k queries. Our experiments show that our solution is scalable and outperforms other existing state of the art solutions, when restricted to homogeneous functions. Going a step further, in the second part of this thesis we consider the problem of incorporating dynamic feedback signals to the original scoring function and propose a new general real-time query evaluation framework with a family of new algorithms for efficiently processing continuous top-k queries with dynamic feedback scores in a real-time web context. Finally, putting together the outcomes of these works, we present MeowsReader, a real-time news ranking and filtering prototype which illustrates how a general class of continuous top-k queries offers a suitable abstraction for modelling and implementing continuous online information filtering applications combining keyword search and real-time web activity.
|
4 |
Semantically-enabled stream processing and complex event processing over RDF graph streams / Traitement de flux sémantiquement activé et traitement d'évènements complexes sur des flux de graphe RDFGillani, Syed 04 November 2016 (has links)
Résumé en français non fourni par l'auteur. / There is a paradigm shift in the nature and processing means of today’s data: data are used to being mostly static and stored in large databases to be queried. Today, with the advent of new applications and means of collecting data, most applications on the Web and in enterprises produce data in a continuous manner under the form of streams. Thus, the users of these applications expect to process a large volume of data with fresh low latency results. This has resulted in the introduction of Data Stream Processing Systems (DSMSs) and a Complex Event Processing (CEP) paradigm – both with distinctive aims: DSMSs are mostly employed to process traditional query operators (mostly stateless), while CEP systems focus on temporal pattern matching (stateful operators) to detect changes in the data that can be thought of as events. In the past decade or so, a number of scalable and performance intensive DSMSs and CEP systems have been proposed. Most of them, however, are based on the relational data models – which begs the question for the support of heterogeneous data sources, i.e., variety of the data. Work in RDF stream processing (RSP) systems partly addresses the challenge of variety by promoting the RDF data model. Nonetheless, challenges like volume and velocity are overlooked by existing approaches. These challenges require customised optimisations which consider RDF as a first class citizen and scale the processof continuous graph pattern matching. To gain insights into these problems, this thesis focuses on developing scalable RDF graph stream processing, and semantically-enabled CEP systems (i.e., Semantic Complex Event Processing, SCEP). In addition to our optimised algorithmic and data structure methodologies, we also contribute to the design of a new query language for SCEP. Our contributions in these two fields are as follows: • RDF Graph Stream Processing. We first propose an RDF graph stream model, where each data item/event within streams is comprised of an RDF graph (a set of RDF triples). Second, we implement customised indexing techniques and data structures to continuously process RDF graph streams in an incremental manner. • Semantic Complex Event Processing. We extend the idea of RDF graph stream processing to enable SCEP over such RDF graph streams, i.e., temporalpattern matching. Our first contribution in this context is to provide a new querylanguage that encompasses the RDF graph stream model and employs a set of expressive temporal operators such as sequencing, kleene-+, negation, optional,conjunction, disjunction and event selection strategies. Based on this, we implement a scalable system that employs a non-deterministic finite automata model to evaluate these operators in an optimised manner. We leverage techniques from diverse fields, such as relational query optimisations, incremental query processing, sensor and social networks in order to solve real-world problems. We have applied our proposed techniques to a wide range of real-world and synthetic datasets to extract the knowledge from RDF structured data in motion. Our experimental evaluations confirm our theoretical insights, and demonstrate the viability of our proposed methods
|
5 |
Representative Subsets for Preference QueriesChester, Sean 26 August 2013 (has links)
We focus on the two overlapping areas of preference queries and dataset summarization. A (linear) preference query specifies the relative importance of the attributes in a dataset and asks for the tuples that best match those preferences. Dataset summarization is the task of representing an entire dataset by a small, representative subset. Within these areas, we focus on three important sub-problems, significantly advancing the state-of-the-art in each.
We begin with an investigation into a new formulation of preference queries, identifying a neglected and important subclass that we call threshold projection queries. While literature typically constrains the attribute preferences (which are real-valued weights) such that their sum is one, we show that this introduces bias when querying by threshold rather than cardinality. Using projection, rather than inner product as in that literature, removes the bias. We then give algorithms for building and querying indices for this class of query, based, in the general case, on geometric duality and halfspace range searching, and, in an important special case, on stereographic projection.
In the second part of the dissertation, we investigate the monochromatic reverse top-k
(mRTOP) query in two dimensions. A mRTOP query asks for, given a tuple and a dataset,
the linear preference queries on the dataset that will include the given tuple. Towards this goal, we consider the novel scenario of building an index to support mRTOP queries, using geometric duality and plane sweep. We show theoretically and empirically that the index is quick to build, small on disk, and very efficient at answering mRTOP queries. As a corollary to these efforts, we defined the top-k rank contour, which encodes the k-ranked tuple for every possible linear preference query. This is tremendously useful in answering mRTOP queries, but also, we posit, of significant independent interest for its relation to myriad related linear preference query problems. Intuitively, the top-k rank contour is the minimum possible representation of knowledge needed to identify the k-ranked tuple for any query, without apriori knowledge of that query.
We also introduce k-regret minimizing sets, a very succinct approximation of a numeric
dataset. The purpose of the approximation is to represent the entire dataset by just a small subset that nonetheless will contain a tuple within or near to the top-k for any linear preference query. We show that the problem of finding k-regret minimizing sets—and, indeed, the problem in literature that it generalizes—is NP-Hard. Still, for the special case of two dimensions, we provide a fast, exact algorithm based on the top-k rank contour. For arbitrary dimension, we introduce a novel greedy algorithm based on linear programming and randomization that does excellently in our empirical investigation. / Graduate / 0984
|
6 |
Considering User Intention in Differential Graph QueriesVasilyeva, Elena, Thiele, Maik, Bornhövd, Christof, Lehner, Wolfgang 30 November 2020 (has links)
Empty answers are a major problem by processing pattern matching queries in graph databases. Especially, there can be multiple reasons why a query failed. To support users in such situations, differential queries can be used that deliver missing parts of a graph query. Multiple heuristics are proposed for differential queries, which reduce the search space. Although they are successful in increasing the performance, they can discard query subgraphs relevant to a user. To address this issue, the authors extend the concept of differential queries and introduce top-k differential queries that calculate the ranking based on users’ preferences and significantly support the users’ understanding of query database management systems. A user assigns relevance weights to elements of a graph query that steer the search and are used for the ranking. In this paper the authors propose different strategies for selection of relevance weights and their propagation. As a result, the search is modelled along the most relevant paths. The authors evaluate their solution and both strategies on the DBpedia data graph.
|
Page generated in 0.0591 seconds